[Lustre-discuss] LBUG ASSERTION(lock->l_resource != NULL) failed

Cliff White Cliff.White at Sun.COM
Wed Jan 14 16:27:52 PST 2009


Brock Palen wrote:
> I am having servers LBUG on a regular basis, Clients are running  
> 1.6.6 patchless on RHEL4,  servers are running RHEL4 with 1.6.5.1  
> RPM's from the download page.  All connection is over Ethernet,   
> Servers are x4600's.

This looks like bug 16496, which is fixed in 1.6.6. You should upgrade
your servers to 1.6.6
cliffw

> 
> The OSS that BUG'd has in its log:
> 
> Jan 13 16:35:39 oss2 kernel: LustreError: 10243:0:(ldlm_lock.c: 
> 430:__ldlm_handle2lock()) ASSERTION(lock->l_resource != NULL) failed
> Jan 13 16:35:39 oss2 kernel: LustreError: 10243:0:(tracefile.c: 
> 432:libcfs_assertion_failed()) LBUG
> Jan 13 16:35:39 oss2 kernel: Lustre: 10243:0:(linux-debug.c: 
> 167:libcfs_debug_dumpstack()) showing stack for process 10243
> Jan 13 16:35:39 oss2 kernel: ldlm_cn_08    R  running task       0  
> 10243      1         10244  7776 (L-TLB)
> Jan 13 16:35:39 oss2 kernel: 0000000000000000 ffffffffa0414629  
> 00000103d83c7e00 0000000000000000
> Jan 13 16:35:39 oss2 kernel:        00000101f8c88d40 ffffffffa021445e  
> 00000103e315dd98 0000000000000001
> Jan 13 16:35:39 oss2 kernel:        00000101f3993ea0 0000000000000000
> Jan 13 16:35:39 oss2 kernel: Call Trace:<ffffffffa0414629> 
> {:ptlrpc:ptlrpc_server_handle_request+2457}
> Jan 13 16:35:39 oss2 kernel:        <ffffffffa021445e> 
> {:libcfs:lcw_update_time+30} <ffffffff80133855>{__wake_up_common+67}
> Jan 13 16:35:39 oss2 kernel:        <ffffffffa0416d05> 
> {:ptlrpc:ptlrpc_main+3989} <ffffffffa0415270> 
> {:ptlrpc:ptlrpc_retry_rqbds+0}
> Jan 13 16:35:39 oss2 kernel:        <ffffffffa0415270> 
> {:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffffa0415270> 
> {:ptlrpc:ptlrpc_retry_rqbds+0}
> Jan 13 16:35:39 oss2 kernel:        <ffffffff80110de3>{child_rip+8}  
> <ffffffffa0415d70>{:ptlrpc:ptlrpc_main+0}
> Jan 13 16:35:39 oss2 kernel:        <ffffffff80110ddb>{child_rip+0}
> Jan 13 16:35:40 oss2 kernel: LustreError: dumping log to /tmp/lustre- 
> log.1231882539.10243
> 
> 
> At the same time a client (nyx346) lost contact with that oss, and is  
> never allowed to reconnect.
> Client /var/log/message:
> 
> Jan 13 16:37:20 nyx346 kernel: Lustre: nobackup-OST000d- 
> osc-000001022c2a7800: Connection to service nobackup-OST000d via nid  
> 10.164.3.245 at tcp was lost; in progress operations using this service  
> will wait for recovery to complete.Jan 13 16:37:20 nyx346 kernel:  
> Lustre: Skipped 6 previous similar messagesJan 13 16:37:20 nyx346  
> kernel: LustreError: 3889:0:(ldlm_request.c:996:ldlm_cli_cancel_req 
> ()) Got rc -11 from cancel RPC: canceling anywayJan 13 16:37:20  
> nyx346 kernel: LustreError: 3889:0:(ldlm_request.c: 
> 1605:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -11Jan 13 16:37:20  
> nyx346 kernel: LustreError: 11-0: an error occurred while  
> communicating with 10.164.3.245 at tcp. The ost_connect operation failed  
> with -16Jan 13 16:37:20 nyx346 kernel: LustreError: Skipped 10  
> previous similar messages
> Jan 13 16:37:45 nyx346 kernel: Lustre: 3849:0:(import.c: 
> 410:import_select_connection()) nobackup-OST000d- 
> osc-000001022c2a7800: tried all connections, increasing latency to 7s
> 
> Even now the server(OSS) is refusing connection to OST00d,  with the  
> message:
> 
> Lustre: 9631:0:(ldlm_lib.c:760:target_handle_connect()) nobackup- 
> OST000d: refuse reconnection from 145a1ec5-07ef- 
> f7eb-0ca9-2a2b6503e0cd at 10.164.1.90@tcp to 0x00000103d5ce7000; still  
> busy with 2 active RPCs
> 
> 
> If I reboot the OSS, the OST's on it go though recovery like normal,  
> and then the client is fine.
> 
> Network looks clean, found one machine with lots of dropped packets  
> between the servers, but that is not the client in question.
> 
> Thank you!  If it happens again, and I find any other data I will let  
> you know.
> 
> 
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> brockp at umich.edu
> (734)936-1985
> 
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss




More information about the lustre-discuss mailing list