[Lustre-discuss] Lustre file system crashing

Ronald K Long rklong at usgs.gov
Fri Oct 1 05:03:36 PDT 2010


>From further research it looks as though this is a known problem with 
open-unlinked directories in 1.8.2 and a fix is attached to bug 22177. 
Would an upgrade to 1.8.4 be advised?

Thanks again

Rocky 



From:
Ronald K Long <rklong at usgs.gov>
To:
lustre-discuss at lists.lustre.org
Date:
10/01/2010 06:32 AM
Subject:
[Lustre-discuss] Lustre file system crashing
Sent by:
lustre-discuss-bounces at lists.lustre.org




We are currently trying to stand up a lustre file system in a system test 
environment before moving it into production.  Twice in the last week the 
file system has locked up with the only recourse of recovery was to reboot 
all clients attached along with the mds/mdt. 

We are currently running Lustre 1.8.2.  Here is the LBUG info we are 
receiving.  If there is anything else I can provide to help find the cause 
please let me know. 

Sep 30 05:30:01 edclxs200 auditd[4529]: Audit daemon rotating log files 
Sep 30 15:45:22 edclxs200 kernel: LustreError: 
7193:0:(mds_reint.c:1772:mds_orphan_add_link()) ASSERTION(inode->i_nlink 
== 2) 
failed: dir nlink == 1 
Sep 30 15:45:22 edclxs200 kernel: LustreError: 
7193:0:(mds_reint.c:1772:mds_orphan_add_link()) LBUG 
Sep 30 15:45:22 edclxs200 kernel: Lustre: 
7193:0:(linux-debug.c:264:libcfs_debug_dumpstack()) showing stack for 
process 7193 
Sep 30 15:45:22 edclxs200 kernel: ll_mdt_30     R  running task       0 
7193      1          7195  7192 (L-TLB) 
Sep 30 15:45:22 edclxs200 kernel:  ffff810592dd7100 ffff81010ba88000 
0000000000000282 0000000000000082 
Sep 30 15:45:22 edclxs200 kernel:  0000008100001400 ffff810348753ef8 
0000000000000001 0000000000000001 
Sep 30 15:45:22 edclxs200 kernel:  ffff810345bcc5b8 0000000000000000 
ffff810348615e10 ffffffff8008ac95 
Sep 30 15:45:22 edclxs200 kernel: Call Trace: 
Sep 30 15:45:22 edclxs200 kernel:  [<ffffffff8008ac95>] 
__wake_up_common+0x3e/0x68 
Sep 30 15:45:22 edclxs200 kernel:  [<ffffffff88703f58>] 
:ptlrpc:ptlrpc_main+0x1258/0x1420 
Sep 30 15:45:22 edclxs200 kernel:  [<ffffffff8008c86b>] 
default_wake_function+0x0/0xe 
Sep 30 15:45:22 edclxs200 kernel:  [<ffffffff800b7076>] 
audit_syscall_exit+0x336/0x362 
Sep 30 15:45:22 edclxs200 kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11 

Sep 30 15:45:22 edclxs200 kernel:  [<ffffffff88702d00>] 
:ptlrpc:ptlrpc_main+0x0/0x1420 
Sep 30 15:45:22 edclxs200 kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11 

Sep 30 15:45:22 edclxs200 kernel: 
Sep 30 15:45:22 edclxs200 kernel: LustreError: dumping log to 
/tmp/lustre-log.1285879522.7193 
Sep 30 15:48:42 edclxs200 kernel: Lustre: Service thread pid 7193 was 
inactive for 200.00s. The thread might be hung, or it mi 
ght only be slow and will resume later. Dumping the stack trace for 
debugging purposes: 
Sep 30 15:48:42 edclxs200 kernel: Lustre: 
0:0:(linux-debug.c:264:libcfs_debug_dumpstack()) showing stack for process 
7193 
Sep 30 15:48:42 edclxs200 kernel: ll_mdt_30     D ffffffff8014e8f3     0 
7193      1          7195  7192 (L-TLB) 
Sep 30 15:48:42 edclxs200 kernel:  ffff8103486157f0 0000000000000046 
0000000000000000 ffffffff8006b921 
Sep 30 15:48:42 edclxs200 kernel:  ffff8103486157b0 0000000000000009 
ffff81034df607e0 ffff81034fcd3080 
Sep 30 15:48:42 edclxs200 kernel:  000290eac846946f 0000000000001d1b 
ffff81034df609c8 000000098003bcc8 



Rocky _______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20101001/47450671/attachment.htm>


More information about the lustre-discuss mailing list