[Lustre-discuss] LBUG ASSERTION(lock->l_resource != NULL) failed
Cliff White
Cliff.White at Sun.COM
Wed Jan 14 16:27:52 PST 2009
Brock Palen wrote:
> I am having servers LBUG on a regular basis, Clients are running
> 1.6.6 patchless on RHEL4, servers are running RHEL4 with 1.6.5.1
> RPM's from the download page. All connection is over Ethernet,
> Servers are x4600's.
This looks like bug 16496, which is fixed in 1.6.6. You should upgrade
your servers to 1.6.6
cliffw
>
> The OSS that BUG'd has in its log:
>
> Jan 13 16:35:39 oss2 kernel: LustreError: 10243:0:(ldlm_lock.c:
> 430:__ldlm_handle2lock()) ASSERTION(lock->l_resource != NULL) failed
> Jan 13 16:35:39 oss2 kernel: LustreError: 10243:0:(tracefile.c:
> 432:libcfs_assertion_failed()) LBUG
> Jan 13 16:35:39 oss2 kernel: Lustre: 10243:0:(linux-debug.c:
> 167:libcfs_debug_dumpstack()) showing stack for process 10243
> Jan 13 16:35:39 oss2 kernel: ldlm_cn_08 R running task 0
> 10243 1 10244 7776 (L-TLB)
> Jan 13 16:35:39 oss2 kernel: 0000000000000000 ffffffffa0414629
> 00000103d83c7e00 0000000000000000
> Jan 13 16:35:39 oss2 kernel: 00000101f8c88d40 ffffffffa021445e
> 00000103e315dd98 0000000000000001
> Jan 13 16:35:39 oss2 kernel: 00000101f3993ea0 0000000000000000
> Jan 13 16:35:39 oss2 kernel: Call Trace:<ffffffffa0414629>
> {:ptlrpc:ptlrpc_server_handle_request+2457}
> Jan 13 16:35:39 oss2 kernel: <ffffffffa021445e>
> {:libcfs:lcw_update_time+30} <ffffffff80133855>{__wake_up_common+67}
> Jan 13 16:35:39 oss2 kernel: <ffffffffa0416d05>
> {:ptlrpc:ptlrpc_main+3989} <ffffffffa0415270>
> {:ptlrpc:ptlrpc_retry_rqbds+0}
> Jan 13 16:35:39 oss2 kernel: <ffffffffa0415270>
> {:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffffa0415270>
> {:ptlrpc:ptlrpc_retry_rqbds+0}
> Jan 13 16:35:39 oss2 kernel: <ffffffff80110de3>{child_rip+8}
> <ffffffffa0415d70>{:ptlrpc:ptlrpc_main+0}
> Jan 13 16:35:39 oss2 kernel: <ffffffff80110ddb>{child_rip+0}
> Jan 13 16:35:40 oss2 kernel: LustreError: dumping log to /tmp/lustre-
> log.1231882539.10243
>
>
> At the same time a client (nyx346) lost contact with that oss, and is
> never allowed to reconnect.
> Client /var/log/message:
>
> Jan 13 16:37:20 nyx346 kernel: Lustre: nobackup-OST000d-
> osc-000001022c2a7800: Connection to service nobackup-OST000d via nid
> 10.164.3.245 at tcp was lost; in progress operations using this service
> will wait for recovery to complete.Jan 13 16:37:20 nyx346 kernel:
> Lustre: Skipped 6 previous similar messagesJan 13 16:37:20 nyx346
> kernel: LustreError: 3889:0:(ldlm_request.c:996:ldlm_cli_cancel_req
> ()) Got rc -11 from cancel RPC: canceling anywayJan 13 16:37:20
> nyx346 kernel: LustreError: 3889:0:(ldlm_request.c:
> 1605:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -11Jan 13 16:37:20
> nyx346 kernel: LustreError: 11-0: an error occurred while
> communicating with 10.164.3.245 at tcp. The ost_connect operation failed
> with -16Jan 13 16:37:20 nyx346 kernel: LustreError: Skipped 10
> previous similar messages
> Jan 13 16:37:45 nyx346 kernel: Lustre: 3849:0:(import.c:
> 410:import_select_connection()) nobackup-OST000d-
> osc-000001022c2a7800: tried all connections, increasing latency to 7s
>
> Even now the server(OSS) is refusing connection to OST00d, with the
> message:
>
> Lustre: 9631:0:(ldlm_lib.c:760:target_handle_connect()) nobackup-
> OST000d: refuse reconnection from 145a1ec5-07ef-
> f7eb-0ca9-2a2b6503e0cd at 10.164.1.90@tcp to 0x00000103d5ce7000; still
> busy with 2 active RPCs
>
>
> If I reboot the OSS, the OST's on it go though recovery like normal,
> and then the client is fine.
>
> Network looks clean, found one machine with lots of dropped packets
> between the servers, but that is not the client in question.
>
> Thank you! If it happens again, and I find any other data I will let
> you know.
>
>
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> brockp at umich.edu
> (734)936-1985
>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
More information about the lustre-discuss
mailing list