[Lustre-discuss] Lustre 1.6.7 kernel panics when umounting MDTs

Wed Apr 1 22:58:23 PDT 2009

Hi,

On Apr 1, 2009, at 7:21 PM, Marcus Schull wrote:

> Lustre Admins,
>
>
> We have recently had 2 odd events I am keen to get some feedback on.
>
> 1) We recently had a filesystem go read-only (noticed on the client)
> due to an "inode out of bounds" error as reported in the server logs.
> Please see below for log and fsck details.

It looks like you're experiencing corruption described in bz 18695,  
which was also reported by us in bz 18889 and by TACC in bz 19029.

At this point in time, Sun hasn't made any official recommendations  
regarding this problem. If you can stomach it, I'm certain that Sun  
(and other sites experiencing this problem) would appreciate feedback  
on what the user whose data was corrupted was doing at the time. If  
you can reproduce the problem, even better.

Unfortunately, by the time ldiskfs detects the corruption, it's  
already happened - it is detecting corrupted metadata that has already  
been written to disk, not the instant at which the corruption is  
occurring. When we first discovered the corruption at our site, we had  
a user report errors when trying to perform ls and rm commands in a  
corrupted directory. ldiskfs did not report any errors. unmounting the  
MDT and running e2fsck on it produced the same type of errors you saw  
during your fsck. We remounted the MDT and later that day ldiskfs did  
produce the "inode out of bounds" error that you saw. So, either the  
corruption occurred again, or e2fsck didn't fix the problem.

This happened a day after upgrading to Lustre 1.6.7. We have since  
downgraded our servers to 1.6.6. We're running SLES10 SP2 with kernel  
2.6.16.60-0.35 + a few site specific patches for OOM handling  
(developed in house in conjunction with SGI) and one for improved SCSI  
error handling that came from Novell, through SGI. So, this problem  
doesn't appear to be specific to the RHEL or SLES kernel, or to our  
patched kernel.

It's worth noting that we only saw this on the MDTs on a pair of  
filesystems. We ran e2fsck against all of the OSTs (a total of 90) and  
all came back clean.

Is there anyone else out there running 1.6.7 that is seeing this  
problem as well? It might be a good idea to unmount your MDTs and run  
e2fsck against them, and report the results back to the mailing list.

j

>
>
> 2) In order to correct the issue (ie remount the device read-write), I
> attempted to umount that device (intending to do a quick fsck as the
> device was the filesystem's MDT, and then remounting) - however this
> action caused a kernel panic.
>
> I have experienced these kernel panics quite a few times in the past
> when umounting MDTs and OSTs in similar situations - when one or more
> have gone read-only, but this is the first with the 1.6.7 kernel.
> The resulting panic required a reboot and fsck on all mounted LUNs -
> which at present is about 15 TB (in a few separate filesystems).
>
> The server is currently acting has the MGS, MDT and OST partitions for
> all lustre filesystems.  Unfortunately we have still not had the
> chance to separate those roles onto different servers.
> The server is running 64 bit RHEL 5.2 with the latest lustre kernel
> (1.6.7) and associated packages.  It runs on a SUN blade with 4 AMD
> cores and 16 GB RAM.
> [ kernel version: 2.6.18-92.1.17.el5_lustre.1.6.7smp #1 SMP Mon Feb 9
> 19:56:55 MST 2009 x86_64 x86_64 x86_64 GNU/Linux ]
>
> All clients are also running RHEL 5.2/5.3 with unpatched kernels and
> the latest lustre packages (1.6.7).
> [ client kernel: 2.6.18-92.1.17.el5 #1 SMP Wed Oct 22 04:19:38 EDT
> 2008 x86_64 x86_64 x86_64 GNU/Linux ]
>
>
> I look forward to any advice regarding the unusual error (1) or any
> procedures to follow in order to prevent the kernel panics.
>
>
> Thanks in advance.
>
> Marcus Schull,
> Systems Administrator
> IMB, University of Queensland
>
>
>
>
>
>
> Lustre server /var/log/messages:
>
> Apr  1 19:03:04 lustre1 kernel: LDISKFS-fs error (device dm-37):
> ldiskfs_add_entry: bad entry in directory #12686675: inode out of
> bounds - offset=0, inode=2807365078, rec_len=4096, name_len=36
> Apr  1 19:03:04 lustre1 kernel: Remounting filesystem read-only
> Apr  1 19:03:04 lustre1 kernel: LDISKFS-fs error (device dm-37) in
> start_transaction: Readonly filesystem
> Apr  1 19:03:04 lustre1 kernel: LustreError: 8059:0:(fsfilt-ldiskfs.c:
> 1231:fsfilt_ldiskfs_write_record()) can't start transaction for 18
> blocks (128 bytes)
> Apr  1 19:03:04 lustre1 kernel: LustreError: 8059:0:(mds_reint.c:
> 230:mds_finish_transno()) @@@ wrote trans #0 rc -5 client
> 5142a23c-5adc-d4e8-4375-1b1028fe8e7d at idx 0: err = -30
> req at ffff8101771ba000 x28445861/t0 o101->51
> 42a23c-5adc-d4e8-4375-1b1028fe8e7d at NET_0x200008266732c_UUID:0/0 lens
> 512/568 e 0 to 0 dl 1238576684 ref 1 fl Interpret:/0/0 rc 0/0
> Apr  1 19:03:04 lustre1 kernel: LustreError: 8059:0:(mds_reint.c:
> 238:mds_finish_transno()) wrote objids: err = 0
> Apr  1 19:03:04 lustre1 kernel: LustreError: 11330:0:(fsfilt- 
> ldiskfs.c:
> 280:fsfilt_ldiskfs_start()) error starting handle for op 8 (33
> credits): rc -30
> Apr  1 19:03:04 lustre1 kernel: LustreError: 11330:0:(fsfilt- 
> ldiskfs.c:
> 280:fsfilt_ldiskfs_start()) error starting handle for op 8 (33
> credits): rc -30
> Apr  1 19:03:04 lustre1 kernel: LustreError: 11330:0:(mds_reint.c:
> 154:mds_finish_transno()) fsfilt_start: -30
> Apr  1 19:03:04 lustre1 kernel: LustreError: 11337:0:(mds_reint.c:
> 154:mds_finish_transno()) fsfilt_start: -30
>
>
>
> fsck output of filesystem with error reported above:
>
> [root at lustre1 ~]# e2fsck /dev/mapper/mpath11p1
> e2fsck 1.40.11.sun1 (17-June-2008)
> qfab-MDT0000: recovering journal
> qfab-MDT0000 contains a file system with errors, check forced.
> Pass 1: Checking inodes, blocks, and sizes
> Inode 12686675, i_size is 28672, should be 65536.  Fix<y>? yes
>
> Pass 2: Checking directory structure
> Problem in HTREE directory inode 12686675: node (0) has an unordered
> hash table
> Clear HTree index<y>? yes
>
> Entry '(=fM-'^\^@ ^@M-~^JM-XM-'p^H$^@:{M-XM-'^H^G$^@TM-O^IM-(x^L(^@t^W
> $M-(' in /ROOT/data1/offindex (12686675) has invalid inode #:
> 2807365078.
> Clear<y>? yes
>
> Pass 3: Checking directory connectivity
> Pass 3A: Optimizing directories
> Pass 4: Checking reference counts
> Unattached inode 12681396
> Connect to /lost+found<y>? yes
>
> Inode 12681396 ref count is 2, should be 1.  Fix<y>? yes
>
> Unattached inode 12681397
> Connect to /lost+found<y>? yes
>
> ... etc
>
>
>
>
> -------
>
> Out of interest, we are also getting occasional "soft CPU lockups",
> although these don't appear to be causing problems at present . . .
>
> Mar 31 08:57:36 lustre1 kernel: BUG: soft lockup - CPU#0 stuck for
> 10s! [ll_mdt_rdpg_11:11332]
> Mar 31 08:57:36 lustre1 kernel: CPU 0:
> Mar 31 08:57:36 lustre1 kernel: Modules linked in: obdfilter(U) ost(U)
> mds(U) fsfilt_ldiskfs(U) mgs(U) mgc(U) ldiskfs(U) crc16(U) lustre(U)
> lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ptlrpc(U) obdclass(U)
> lnet(U) lvfs(U) li
> bcfs(U) ip_conntrack_netbios_ns(U) ipt_REJECT(U) xt_tcpudp(U)
> xt_state(U) ip_conntrack(U) nfnetlink(U) iptable_filter(U)
> ip_tables(U) x_tables(U) cpufreq_ondemand(U) dm_round_robin(U)
> dm_multipath(U) video(U) sbs(U) backlig
> ht(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U)
> ac(U) parport_pc(U) lp(U) parport(U) sr_mod(U) joydev(U) cdrom(U)
> pata_acpi(U) libata(U) i2c_nforce2(U) k8temp(U) shpchp(U) pcspkr(U)
> i2c_core(U) hwmon(U)
>  k8_edac(U) sg(U) forcedeth(U) edac_mc(U) serio_raw(U) e1000e(U)
> dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_mod(U) usb_storage(U)
> qla2xxx(U) scsi_transport_fc(U) mptsas(U) mptscsih(U) mptbase(U)
> scsi_transport_sas(U) sd_mod(
> U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U)
> Mar 31 08:57:36 lustre1 kernel: Pid: 11332, comm: ll_mdt_rdpg_11
> Tainted: G      2.6.18-92.1.17.el5_lustre.1.6.7smp #1
> Mar 31 08:57:36 lustre1 kernel: RIP: 0010:[<ffffffff800649e6>]
> [<ffffffff800649e6>] _write_lock+0x7/0xf
> Mar 31 08:57:36 lustre1 kernel: RSP: 0018:ffff81024f0e3928  EFLAGS:
> 00000246
> Mar 31 08:57:36 lustre1 kernel: RAX: ffff81005ab05600 RBX:
> 0000000000002b10 RCX: 00000000000053b7
> Mar 31 08:57:36 lustre1 kernel: RDX: fffffffffffff5e8 RSI:
> ffffffff802f0d80 RDI: ffffc20000d1810c
> Mar 31 08:57:36 lustre1 kernel: RBP: 0000000000000286 R08:
> ffff810001006360 R09: ffff8100610996c0
> Mar 31 08:57:36 lustre1 kernel: R10: 5a5a5a5a5a5a5a5a R11:
> 5a5a5a5a5a5a5a5a R12: 0000000000000286
> Mar 31 08:57:36 lustre1 kernel: R13: ffff81024f0e38d0 R14:
> ffff810061099718 R15: ffff8100610996c0
> Mar 31 08:57:36 lustre1 kernel: FS:  00002b6a51d6d220(0000)
> GS:ffffffff803ea000(0000) knlGS:00000000f7f326d0
> Mar 31 08:57:36 lustre1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 000000008005003b
> Mar 31 08:57:36 lustre1 kernel: CR2: 00002b5047841000 CR3:
> 0000000000201000 CR4: 00000000000006e0
> Mar 31 08:57:36 lustre1 kernel:
> Mar 31 08:57:36 lustre1 kernel: Call Trace:
> Mar 31 08:57:36 lustre1 kernel:
> [<ffffffff885893fe>] :obdclass:lustre_hash_for_each_empty+0x20e/0x290
> Mar 31 08:57:36 lustre1 kernel:
> [<ffffffff8858fbf8>] :obdclass:class_disconnect+0x378/0x400
> Mar 31 08:57:36 lustre1 kernel:
> [<ffffffff886055b0>] :ptlrpc:ldlm_cancel_locks_for_export_cb+0x0/0xc0
> Mar 31 08:57:36 lustre1 kernel:
> [<ffffffff888fc05b>] :mds:mds_disconnect+0x11b/0x310
> Mar 31 08:57:36 lustre1 kernel:  [<ffffffff8003d514>] lock_timer_base
> +0x1b/0x3c
> Mar 31 08:57:36 lustre1 kernel:  [<ffffffff8004a824>]
> try_to_del_timer_sync+0x51/0x5a
> Mar 31 08:57:36 lustre1 kernel:
> [<ffffffff8858bce4>] :obdclass:class_fail_export+0x384/0x4c0
> Mar 31 08:57:36 lustre1 kernel:   
> [<ffffffff888f4751>] :mds:mds_readpage
> +0x1571/0x18a0
> Mar 31 08:57:36 lustre1 kernel:  [<ffffffff8853a5f0>] :lnet:LNetMDBind
> +0x2c0/0x420
> Mar 31 08:57:36 lustre1 kernel:
> [<ffffffff8863ef33>] :ptlrpc:ptl_send_buf+0x3f3/0x5b0
> Mar 31 08:57:36 lustre1 kernel:
> [<ffffffff88645004>] :ptlrpc:lustre_msg_set_timeout+0x34/0x110
> Mar 31 08:57:36 lustre1 kernel:  [<ffffffff8008abb9>]
> default_wake_function+0x0/0xe
> Mar 31 08:57:36 lustre1 kernel:  [<ffffffff888f870b>] :mds:mds_handle
> +0x216b/0x4d30
> Mar 31 08:57:36 lustre1 kernel:  [<ffffffff800d3fc7>] free_block 
> +0x11c/
> 0x13b
> Mar 31 08:57:36 lustre1 kernel:  [<ffffffff80143809>] __next_cpu
> +0x19/0x28
> Mar 31 08:57:36 lustre1 kernel:  [<ffffffff80143809>] __next_cpu
> +0x19/0x28
> Mar 31 08:57:36 lustre1 kernel:  [<ffffffff800898e3>]
> find_busiest_group+0x20d/0x621
> Mar 31 08:57:36 lustre1 kernel:
> [<ffffffff886435a5>] :ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0
> Mar 31 08:57:36 lustre1 kernel:
> [<ffffffff8864bcfa>] :ptlrpc:ptlrpc_server_request_get+0x6a/0x150
> Mar 31 08:57:36 lustre1 kernel:
> [<ffffffff8864db7d>] :ptlrpc:ptlrpc_check_req+0x1d/0x110
> Mar 31 08:57:36 lustre1 kernel:
> [<ffffffff88650103>] :ptlrpc:ptlrpc_server_handle_request+0xa93/0x1150
> Mar 31 08:57:36 lustre1 kernel:  [<ffffffff80062f4b>] thread_return
> +0x0/0xdf
> Mar 31 08:57:36 lustre1 kernel:  [<ffffffff8006d8a2>] do_gettimeofday
> +0x40/0x8f
> Mar 31 08:57:36 lustre1 kernel:
> [<ffffffff884f37c6>] :libcfs:lcw_update_time+0x16/0x100
> Mar 31 08:57:36 lustre1 kernel:  [<ffffffff800891f6>] __wake_up_common
> +0x3e/0x68
> Mar 31 08:57:36 lustre1 kernel:
> [<ffffffff886535f8>] :ptlrpc:ptlrpc_main+0x1218/0x13e0
> Mar 31 08:57:36 lustre1 kernel:  [<ffffffff8008abb9>]
> default_wake_function+0x0/0xe
> Mar 31 08:57:36 lustre1 kernel:  [<ffffffff8005dfb1>] child_rip+0xa/ 
> 0x11
> Mar 31 08:57:36 lustre1 kernel:
> [<ffffffff886523e0>] :ptlrpc:ptlrpc_main+0x0/0x13e0
> Mar 31 08:57:37 lustre1 kernel:  [<ffffffff8005dfa7>] child_rip 
> +0x0/0x11
> Mar 31 08:57:37 lustre1 kernel:
> Mar 31 08:57:46 lustre1 kernel: BUG: soft lockup - CPU#0 stuck for
> 10s! [ll_mdt_rdpg_11:11332]
> Mar 31 08:57:46 lustre1 kernel: CPU 0:
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss