[Lustre-discuss] MDS LBUG: mds_inode_is_orphan(dchild->d_inode)) failed:dchild

Frederik Ferner frederik.ferner at diamond.ac.uk
Tue Nov 10 05:18:49 PST 2009


Hi,

occasionally we run into a LBUG on our MDS 
(ASSERTION(!mds_inode_is_orphan(dchild->d_inode)) failed:dchild)[1]. A 
quick search revealed an old post from May[2] and at least one bug that 
may be related. (https://bugzilla.lustre.org/show_bug.cgi?id=16492 which 
is against 1.6.5 or https://bugzilla.lustre.org/show_bug.cgi?id=17764 
against 1.4.X with reports fir other versions as well.)

We are running 1.6.6 on RHEL 5 on the MDS currently and I'm not sure I 
understand the state of the bugs.

Can someone help me work out if that bug should be fixed in later 
version of 1.6.X? The changelog[3] seems to suggest at least 16492 is 
but it also mentions the bug number in two instances which slightly 
confuses me. The second description however might fit the problem we see.

Before I attempt to upgrade our MDS, I would like to know if the bug is 
fixed in 1.6.7.2.

Kind regards,
Frederik

[1]
Syslog has has the following entries for the LBUG:

Nov 10 09:51:19 cs04r-sc-mds01-01 kernel: LustreError: 
20109:0:(mds_open.c:1156:mds_open()) 
ASSERTION(!mds_inode_is_orphan(dchild->d_inode)) failed:dchild 
71229c0:d0c22313 (ffff8103065c8150) inode 
ffff810391a98528/118630848/3502383891
Nov 10 09:51:19 cs04r-sc-mds01-01 kernel: LustreError: 
20109:0:(mds_open.c:1156:mds_open()) LBUG
Nov 10 09:51:19 cs04r-sc-mds01-01 kernel: Lustre: 
20109:0:(linux-debug.c:185:libcfs_debug_dumpstack()) showing stack for 
process 20109
Nov 10 09:51:19 cs04r-sc-mds01-01 kernel: ll_mdt_25     R  running task 
       0 20109      1         20110 20108 (L-TLB)
Nov 10 09:51:19 cs04r-sc-mds01-01 kernel:  0000000000000000 
ffffffff8006d940 ffff81020200f140 ffffffff8869877d
Nov 10 09:51:19 cs04r-sc-mds01-01 kernel:  ffff810205fa3000 
ffff810205fa30e8 ffff8101ffe39480 ffffffff88696476
Nov 10 09:51:19 cs04r-sc-mds01-01 kernel:  ffff810205fa3190 
0000000000000000 ffff8101fb0e5e10 ffffffff800893bb
Nov 10 09:51:19 cs04r-sc-mds01-01 kernel: Call Trace:
Nov 10 09:51:19 cs04r-sc-mds01-01 kernel:  [<ffffffff8006d940>] 
do_gettimeofday+0x50/0x92
Nov 10 09:51:19 cs04r-sc-mds01-01 kernel:  [<ffffffff88696476>] 
:libcfs:lcw_update_time+0x16/0x100
Nov 10 09:51:19 cs04r-sc-mds01-01 kernel:  [<ffffffff800893bb>] 
__wake_up_common+0x3e/0x68
Nov 10 09:51:19 cs04r-sc-mds01-01 kernel:  [<ffffffff887ea22c>] 
:ptlrpc:ptlrpc_main+0xe0c/0xf90
Nov 10 09:51:19 cs04r-sc-mds01-01 kernel:  [<ffffffff8008ad7e>] 
default_wake_function+0x0/0xe
Nov 10 09:51:19 cs04r-sc-mds01-01 kernel:  [<ffffffff800b4610>] 
audit_syscall_exit+0x31b/0x336
Nov 10 09:51:19 cs04r-sc-mds01-01 kernel:  [<ffffffff8005dfb1>] 
child_rip+0xa/0x11
Nov 10 09:51:19 cs04r-sc-mds01-01 kernel:  [<ffffffff887e9420>] 
:ptlrpc:ptlrpc_main+0x0/0xf90
Nov 10 09:51:19 cs04r-sc-mds01-01 kernel:  [<ffffffff8005dfa7>] 
child_rip+0x0/0x11

Followed by a watchdog:
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel: Lustre: 
0:0:(watchdog.c:148:lcw_cb()) Watchdog triggered for pid 20109: it was 
inactive for 200s
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel: Lustre: 
0:0:(linux-debug.c:185:libcfs_debug_dumpstack()) showing stack for 
process 20109
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel: ll_mdt_25     D 
ffff810428e4a5a8     0 20109      1         20110 20108 (L-TLB)
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  ffff8101fb0e5700 
0000000000000046 0000000000000000 ffffffff80450560
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  ffff8101fb0e56c0 
000000000000000a ffff8102030bd7e0 ffff8102471be040
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  0009bf805a7afba6 
0000000000002403 ffff8102030bd9c8 0000000300000484
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel: Call Trace:
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff8008ad7e>] 
default_wake_function+0x0/0xe
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff8868dc6b>] 
:libcfs:lbug_with_loc+0xbb/0xc0
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff88ab0eb7>] 
:mds:mds_open+0x2017/0x332e
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff800893bb>] 
__wake_up_common+0x3e/0x68
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff8885d2d1>] 
:ksocklnd:ksocknal_queue_tx_locked+0x4f1/0x550
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff88a8d8c9>] 
:mds:mds_reint_rec+0x1d9/0x2b0
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff88ab4133>] 
:mds:mds_open_unpack+0x2f3/0x410
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff88a8091a>] 
:mds:mds_reint+0x35a/0x420
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff88a7efa2>] 
:mds:fixup_handle_for_resent_req+0x52/0x200
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff88a84974>] 
:mds:mds_intent_policy+0x484/0xc30
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff886db50c>] 
:lnet:LNetMDBind+0x2ac/0x400
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff887a4156>] 
:ptlrpc:ldlm_resource_putref+0x1b6/0x3a0
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff887a1916>] 
:ptlrpc:ldlm_lock_enqueue+0x186/0x990
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff8879e73d>] 
:ptlrpc:ldlm_lock_create+0x9ad/0x9e0
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff887c34d0>] 
:ptlrpc:ldlm_server_completion_ast+0x0/0x5c0
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff887c0de5>] 
:ptlrpc:ldlm_handle_enqueue+0xca5/0x12a0
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff887c3a90>] 
:ptlrpc:ldlm_server_blocking_ast+0x0/0x6b0
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff88a89155>] 
:mds:mds_handle+0x4035/0x4cf0
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff801437a4>] 
__next_cpu+0x19/0x28
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff800756b4>] 
smp_send_reschedule+0x4e/0x53
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff8873c031>] 
:obdclass:class_handle2object+0xd1/0x160
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff887db705>] 
:ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff887e50da>] 
:ptlrpc:ptlrpc_check_req+0x1a/0x110
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff887e72c2>] 
:ptlrpc:ptlrpc_server_handle_request+0x992/0x1040
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff80062f4b>] 
thread_return+0x0/0xdf
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff8006d940>] 
do_gettimeofday+0x50/0x92
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff88696476>] 
:libcfs:lcw_update_time+0x16/0x100
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff800893bb>] 
__wake_up_common+0x3e/0x68
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff887ea22c>] 
:ptlrpc:ptlrpc_main+0xe0c/0xf90
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff8008ad7e>] 
default_wake_function+0x0/0xe
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff800b4610>] 
audit_syscall_exit+0x31b/0x336
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff8005dfb1>] 
child_rip+0xa/0x11
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff887e9420>] 
:ptlrpc:ptlrpc_main+0x0/0xf90
Nov 10 09:54:39 cs04r-sc-mds01-01 kernel:  [<ffffffff8005dfa7>] 
child_rip+0x0/0x11

[2] http://lists.lustre.org/pipermail/lustre-discuss/2009-May/010469.html
[3] http://lists.lustre.org/pipermail/lustre-discuss/2009-May/010469.html
-- 
Frederik Ferner
Computer Systems Administrator		phone: +44 1235 77 8624
Diamond Light Source Ltd.		mob:   +44 7917 08 5110
(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)



More information about the lustre-discuss mailing list