[Lustre-discuss] LBUG: ost_rw_hpreq_check() ASSERTION(nb != NULL) failed

Erich Focht efocht at hpce.nec.com
Tue Apr 20 00:51:08 PDT 2010


Hi Bernd,

thanks, I reopened your bug 19992. Wonder why I couldn't find it in bugzilla...

Regards,
Erich



Bernd Schubert wrote:
> Hello Erich,
> 
> check out my bug report:
> 
> https://bugzilla.lustre.org/show_bug.cgi?id=19992
> 
> It was closed as duplicate of bug 16129, although that is probably not 
> correct, as 16129 is the root cause, but not the solution.
> 
> As we never observed it with 1.6.7.2 I didn't complain bug 19992 was closed. 
> As you now can confirm it also happens with 1.6.7.2, please re-open that bug.
> 
> 
> Thanks,
> Bernd
> 
> On Monday 19 April 2010, Erich Focht wrote:
>> Hi,
>>
>> we saw this LBUG 3 times within past week, and are puzzled of what's going
>>  on, and how comes there's no bugzilla entry for this...
>>
>> What happens is that on an OSS a request (must be read or write) expects
>> (according to the content of the ioobj structure) to find an array of 22
>>  struct niobuf_remote's (niocount), but only finds one. This is obviously
>>  corrupted.
>>
>> We enabled checksumming where we could, but unfortunately the request
>>  headers don't seem to be covered by any checksum check (well, the reply
>>  path possibly is). Anyway, we see no corruption/checksum failures for bulk
>>  data transfer, so it's improbable that this is a corruption on the wire,
>>  that three times in a row says "size 16 too small (required X)"  (with X
>>  being 352, 432, 4016 in our failures).
>>
>> Did anybody see this? Any ideas or hints?
>>
>> We're using Lustre 1.6.7.2 on server and client side.
>>
>>
>> The LBUG traceback is:
>>
>> LustreError: 12946:0:(pack_generic.c:566:lustre_msg_buf_v2()) msg
>> ffff8101d0c4aad0 buffer[3] size 16 too small (required 352)
>> LustreError: 12946:0:(ost_handler.c:1594:ost_rw_hpreq_check()) ASSERTION(nb
>>  != NULL) failed
>> LustreError: 12946:0:(ost_handler.c:1594:ost_rw_hpreq_check()) LBUG
>> Lustre: 12946:0:(linux-debug.c:222:libcfs_debug_dumpstack()) showing stack
>>  for process 12946
>> ll_ost_io_135 R  running task       0 12946      1         12947 12945
>>  (L-TLB) ffffffff88574438 ffffffff88abb2e0 000000000000063a
>>  ffff8101d0c4ac28 ffffffff88abb2e0 ffffffff88571c20 0000000000000000
>>  0000000000000000 ffffffff88574a35 ffffffff88abc7e2 0000000000000000
>>  0000000000000016 Call Trace:
>>  [<ffffffff88571c20>] :libcfs:tracefile_init+0x0/0x110
>>  [<ffffffff88aac641>] :ost:ost_rw_hpreq_check+0x1b1/0x290
>>  [<ffffffff88ab9ebf>] :ost:ost_hpreq_handler+0x50f/0x7c0
>>  [<ffffffff886d243b>] :ptlrpc:ptlrpc_main+0xebb/0x13e0
>>  [<ffffffff8008a4aa>] default_wake_function+0x0/0xe
>>  [<ffffffff800b4a6d>] audit_syscall_exit+0x327/0x342
>>  [<ffffffff8005dfb1>] child_rip+0xa/0x11
>>  [<ffffffff886d1580>] :ptlrpc:ptlrpc_main+0x0/0x13e0
>>  [<ffffffff8005dfa7>] child_rip+0x0/0x11
>>
>>
>> Regards,
>> Erich
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
> 
> 



More information about the lustre-discuss mailing list