[lustre-discuss] Corrupted? MDT not mounting

Thu May 5 06:16:36 PDT 2022

> It's looking more like something filled up our space - I'm just
> copying the files out as a backup (mounted as ldiskfs just now) -

Ahem. Inode quotas are a good idea. Turns out that a user creating
about 130 million directories rapidly is more than a small MDT volume
can take.

An update on recovery progress - Upgrading the MDS to 2.12 got us over
the issue in LU-12674 enough to recover, and I've migrated half (one
of the HA pairs) of the OSSs to RHEL 7.9 / Lustre 2.12.8 too

It needed a set of writeconf's doing before they'd mount, and e2fsck
has run over any suspect luns. The filesystem "works" in that under
light testing I can read/write OK, but as soon as it gets stressed,
OSSs are falling over

[ 1226.864430] BUG: unable to handle kernel NULL pointer dereference
at           (null)
[ 1226.872281] IP: [<ffffffffac3a684b>] __list_add+0x1b/0xc0
[ 1226.877699] PGD 1ffba0d067 PUD 1ffa48e067 PMD 0
[ 1226.882360] Oops: 0000 [#1] SMP
[ 1226.885619] Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE)
mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE)
ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE)
dm_round_robin ib_srp scsi_transport_srp scsi_tgt tcp_diag inet_diag
ib_isert iscsi_target_mod target_core_mod rpcrdma rdma_ucm ib_iser
ib_umad bonding rdma_cm ib_ipoib iw_cm libiscsi scsi_transport_iscsi
ib_cm mlx4_ib ib_uverbs ib_core sunrpc ext4 mbcache jbd2 sb_edac
intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel iTCO_wdt kvm
iTCO_vendor_support irqbypass crc32_pclmul ghash_clmulni_intel
aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr
i2c_i801 lpc_ich mei_me joydev mei sg ioatdma wmi ipmi_si ipmi_devintf
ipmi_msghandler dm_multipath acpi_pad acpi_power_meter dm_mod
ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mlx4_en
ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm
drm igb ahci libahci mpt2sas mlx4_core ptp crct10dif_pclmul
crct10dif_common libata crc32c_intel pps_core dca raid_class devlink
i2c_algo_bit drm_panel_orientation_quirks scsi_transport_sas nfit
libnvdimm [last unloaded: scsi_tgt]
[ 1226.987670] CPU: 6 PID: 366 Comm: kworker/u24:6 Kdump: loaded
Tainted: G           OE  ------------
3.10.0-1160.49.1.el7_lustre.x86_64 #1
[ 1227.000168] Hardware name: SGI.COM CH-C1104-GP6/X10SRW-F, BIOS 3.1 06/06/2018
[ 1227.007310] Workqueue: rdma_cm cma_work_handler [rdma_cm]
[ 1227.012725] task: ffff934839f0b180 ti: ffff934836c20000 task.ti:
ffff934836c20000
[ 1227.020195] RIP: 0010:[<ffffffffac3a684b>]  [<ffffffffac3a684b>]
__list_add+0x1b/0xc0
[ 1227.028036] RSP: 0018:ffff934836c23d68  EFLAGS: 00010246
[ 1227.033339] RAX: 00000000ffffffff RBX: ffff934836c23d90 RCX: 0000000000000000
[ 1227.040463] RDX: ffff932fa518e680 RSI: 0000000000000000 RDI: ffff934836c23d90
[ 1227.047587] RBP: ffff934836c23d80 R08: 0000000000000000 R09: b2df8c1b3dcb3100
[ 1227.054712] R10: b2df8c1b3dcb3100 R11: 0000000000ffffff R12: ffff932fa518e680
[ 1227.061835] R13: 0000000000000000 R14: 00000000ffffffff R15: ffff932fa518e680
[ 1227.068958] FS:  0000000000000000(0000) GS:ffff93483f380000(0000)
knlGS:0000000000000000
[ 1227.077034] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1227.082772] CR2: 0000000000000000 CR3: 0000001fe47a8000 CR4: 00000000003607e0
[ 1227.089895] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1227.097020] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1227.104142] Call Trace:
[ 1227.106593]  [<ffffffffac7881c6>] __mutex_lock_slowpath+0xa6/0x1d0
[ 1227.112770]  [<ffffffffac02b59e>] ? __switch_to+0xce/0x580
[ 1227.118255]  [<ffffffffac7875bf>] mutex_lock+0x1f/0x2f
[ 1227.123399]  [<ffffffffc06fcac5>] cma_work_handler+0x25/0xa0 [rdma_cm]
[ 1227.129922]  [<ffffffffac0bde8f>] process_one_work+0x17f/0x440
[ 1227.135752]  [<ffffffffac0befa6>] worker_thread+0x126/0x3c0
[ 1227.141324]  [<ffffffffac0bee80>] ? manage_workers.isra.26+0x2a0/0x2a0
[ 1227.147849]  [<ffffffffac0c5e61>] kthread+0xd1/0xe0
[ 1227.152729]  [<ffffffffac0c5d90>] ? insert_kthread_work+0x40/0x40
[ 1227.158822]  [<ffffffffac795ddd>] ret_from_fork_nospec_begin+0x7/0x21
[ 1227.165260]  [<ffffffffac0c5d90>] ? insert_kthread_work+0x40/0x40
[ 1227.171348] Code: ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00
55 48 89 e5 41 55 49 89 f5 41 54 49 89 d4 53 4c 8b 42 08 48 89 fb 49
39 f0 75 2a <4d> 8b 45 00 4d 39 c4 75 68 4c 39 e3 74 3e 4c 39 eb 74 39
49 89
[ 1227.191295] RIP  [<ffffffffac3a684b>] __list_add+0x1b/0xc0
[ 1227.196798]  RSP <ffff934836c23d68>
[ 1227.200284] CR2: 0000000000000000

and I'm able to reproduce this on multiple servers :-/

I can see a few mentions (https://access.redhat.com/solutions/4969471
for example) that seem to hint it's low memory triggered, but they
also say it's fixed in the Red Hat 7.9 kernel (and we're running the
2.12.8 stock 3.10.0-1160.49.1.el7_lustre.x86_64)

I've got a case open with the vendor to see if there are any firmware
updates - but I'm not hopeful. These are 6 core single socket
broadwells. with 128G of RAM, Storage disks are mounted over SRP from
a DDN appliance. Would jumping to MOFED make a difference? Otherwise
I'm open to suggestions as it's getting very tiring wrangling servers
back to life

[root at astrofs-oss1 ~]# ls -l /var/crash/ | grep 2022
drwxr-xr-x 2 root root 44 May  1 23:12 127.0.0.1-2022-05-01-23:11:51
drwxr-xr-x 2 root root 44 May  2 15:58 127.0.0.1-2022-05-02-15:58:26
drwxr-xr-x 2 root root 44 May  3 21:43 127.0.0.1-2022-05-03-21:43:27
drwxr-xr-x 2 root root 44 May  4 17:30 127.0.0.1-2022-05-04-17:29:08
drwxr-xr-x 2 root root 44 May  4 17:38 127.0.0.1-2022-05-04-17:37:46
drwxr-xr-x 2 root root 44 May  4 19:11 127.0.0.1-2022-05-04-19:11:27
drwxr-xr-x 2 root root 44 May  4 19:19 127.0.0.1-2022-05-04-19:19:19
drwxr-xr-x 2 root root 44 May  5 17:09 127.0.0.1-2022-05-05-17:08:50
drwxr-xr-x 2 root root 44 May  5 20:26 127.0.0.1-2022-05-05-20:24:20
drwxr-xr-x 2 root root 44 May  5 20:33 127.0.0.1-2022-05-05-20:32:18
drwxr-xr-x 2 root root 44 May  5 20:58 127.0.0.1-2022-05-05-20:56:27
drwxr-xr-x 2 root root 44 May  5 21:05 127.0.0.1-2022-05-05-21:04:06

Many thanks

Andrew