[Lustre-discuss] Lustre file system crashing
Ronald K Long
rklong at usgs.gov
Fri Oct 1 04:31:44 PDT 2010
We are currently trying to stand up a lustre file system in a system test
environment before moving it into production. Twice in the last week the
file system has locked up with the only recourse of recovery was to reboot
all clients attached along with the mds/mdt.
We are currently running Lustre 1.8.2. Here is the LBUG info we are
receiving. If there is anything else I can provide to help find the cause
please let me know.
Sep 30 05:30:01 edclxs200 auditd[4529]: Audit daemon rotating log files
Sep 30 15:45:22 edclxs200 kernel: LustreError:
7193:0:(mds_reint.c:1772:mds_orphan_add_link()) ASSERTION(inode->i_nlink
== 2)
failed: dir nlink == 1
Sep 30 15:45:22 edclxs200 kernel: LustreError:
7193:0:(mds_reint.c:1772:mds_orphan_add_link()) LBUG
Sep 30 15:45:22 edclxs200 kernel: Lustre:
7193:0:(linux-debug.c:264:libcfs_debug_dumpstack()) showing stack for
process 7193
Sep 30 15:45:22 edclxs200 kernel: ll_mdt_30 R running task 0
7193 1 7195 7192 (L-TLB)
Sep 30 15:45:22 edclxs200 kernel: ffff810592dd7100 ffff81010ba88000
0000000000000282 0000000000000082
Sep 30 15:45:22 edclxs200 kernel: 0000008100001400 ffff810348753ef8
0000000000000001 0000000000000001
Sep 30 15:45:22 edclxs200 kernel: ffff810345bcc5b8 0000000000000000
ffff810348615e10 ffffffff8008ac95
Sep 30 15:45:22 edclxs200 kernel: Call Trace:
Sep 30 15:45:22 edclxs200 kernel: [<ffffffff8008ac95>]
__wake_up_common+0x3e/0x68
Sep 30 15:45:22 edclxs200 kernel: [<ffffffff88703f58>]
:ptlrpc:ptlrpc_main+0x1258/0x1420
Sep 30 15:45:22 edclxs200 kernel: [<ffffffff8008c86b>]
default_wake_function+0x0/0xe
Sep 30 15:45:22 edclxs200 kernel: [<ffffffff800b7076>]
audit_syscall_exit+0x336/0x362
Sep 30 15:45:22 edclxs200 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11
Sep 30 15:45:22 edclxs200 kernel: [<ffffffff88702d00>]
:ptlrpc:ptlrpc_main+0x0/0x1420
Sep 30 15:45:22 edclxs200 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11
Sep 30 15:45:22 edclxs200 kernel:
Sep 30 15:45:22 edclxs200 kernel: LustreError: dumping log to
/tmp/lustre-log.1285879522.7193
Sep 30 15:48:42 edclxs200 kernel: Lustre: Service thread pid 7193 was
inactive for 200.00s. The thread might be hung, or it mi
ght only be slow and will resume later. Dumping the stack trace for
debugging purposes:
Sep 30 15:48:42 edclxs200 kernel: Lustre:
0:0:(linux-debug.c:264:libcfs_debug_dumpstack()) showing stack for process
7193
Sep 30 15:48:42 edclxs200 kernel: ll_mdt_30 D ffffffff8014e8f3 0
7193 1 7195 7192 (L-TLB)
Sep 30 15:48:42 edclxs200 kernel: ffff8103486157f0 0000000000000046
0000000000000000 ffffffff8006b921
Sep 30 15:48:42 edclxs200 kernel: ffff8103486157b0 0000000000000009
ffff81034df607e0 ffff81034fcd3080
Sep 30 15:48:42 edclxs200 kernel: 000290eac846946f 0000000000001d1b
ffff81034df609c8 000000098003bcc8
Rocky
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20101001/065d4f17/attachment.htm>
More information about the lustre-discuss
mailing list