[Lustre-discuss] Lustre file system crashing

Ronald K Long rklong at usgs.gov
Fri Oct 1 04:31:44 PDT 2010


We are currently trying to stand up a lustre file system in a system test 
environment before moving it into production.  Twice in the last week the 
file system has locked up with the only recourse of recovery was to reboot 
all clients attached along with the mds/mdt.

We are currently running Lustre 1.8.2.  Here is the LBUG info we are 
receiving.  If there is anything else I can provide to help find the cause 
please let me know.

Sep 30 05:30:01 edclxs200 auditd[4529]: Audit daemon rotating log files
Sep 30 15:45:22 edclxs200 kernel: LustreError: 
7193:0:(mds_reint.c:1772:mds_orphan_add_link()) ASSERTION(inode->i_nlink 
== 2)
failed: dir nlink == 1
Sep 30 15:45:22 edclxs200 kernel: LustreError: 
7193:0:(mds_reint.c:1772:mds_orphan_add_link()) LBUG
Sep 30 15:45:22 edclxs200 kernel: Lustre: 
7193:0:(linux-debug.c:264:libcfs_debug_dumpstack()) showing stack for 
process 7193
Sep 30 15:45:22 edclxs200 kernel: ll_mdt_30     R  running task       0 
7193      1          7195  7192 (L-TLB)
Sep 30 15:45:22 edclxs200 kernel:  ffff810592dd7100 ffff81010ba88000 
0000000000000282 0000000000000082
Sep 30 15:45:22 edclxs200 kernel:  0000008100001400 ffff810348753ef8 
0000000000000001 0000000000000001
Sep 30 15:45:22 edclxs200 kernel:  ffff810345bcc5b8 0000000000000000 
ffff810348615e10 ffffffff8008ac95
Sep 30 15:45:22 edclxs200 kernel: Call Trace:
Sep 30 15:45:22 edclxs200 kernel:  [<ffffffff8008ac95>] 
__wake_up_common+0x3e/0x68
Sep 30 15:45:22 edclxs200 kernel:  [<ffffffff88703f58>] 
:ptlrpc:ptlrpc_main+0x1258/0x1420
Sep 30 15:45:22 edclxs200 kernel:  [<ffffffff8008c86b>] 
default_wake_function+0x0/0xe
Sep 30 15:45:22 edclxs200 kernel:  [<ffffffff800b7076>] 
audit_syscall_exit+0x336/0x362
Sep 30 15:45:22 edclxs200 kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
Sep 30 15:45:22 edclxs200 kernel:  [<ffffffff88702d00>] 
:ptlrpc:ptlrpc_main+0x0/0x1420
Sep 30 15:45:22 edclxs200 kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11
Sep 30 15:45:22 edclxs200 kernel:
Sep 30 15:45:22 edclxs200 kernel: LustreError: dumping log to 
/tmp/lustre-log.1285879522.7193
Sep 30 15:48:42 edclxs200 kernel: Lustre: Service thread pid 7193 was 
inactive for 200.00s. The thread might be hung, or it mi
ght only be slow and will resume later. Dumping the stack trace for 
debugging purposes:
Sep 30 15:48:42 edclxs200 kernel: Lustre: 
0:0:(linux-debug.c:264:libcfs_debug_dumpstack()) showing stack for process 
7193
Sep 30 15:48:42 edclxs200 kernel: ll_mdt_30     D ffffffff8014e8f3     0 
7193      1          7195  7192 (L-TLB)
Sep 30 15:48:42 edclxs200 kernel:  ffff8103486157f0 0000000000000046 
0000000000000000 ffffffff8006b921
Sep 30 15:48:42 edclxs200 kernel:  ffff8103486157b0 0000000000000009 
ffff81034df607e0 ffff81034fcd3080
Sep 30 15:48:42 edclxs200 kernel:  000290eac846946f 0000000000001d1b 
ffff81034df609c8 000000098003bcc8



Rocky 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20101001/065d4f17/attachment.htm>


More information about the lustre-discuss mailing list