[Lustre-discuss] LBUG: ost_rw_hpreq_check() ASSERTION(nb != NULL) failed
Erich Focht
efocht at hpce.nec.com
Tue Apr 20 00:51:08 PDT 2010
Hi Bernd,
thanks, I reopened your bug 19992. Wonder why I couldn't find it in bugzilla...
Regards,
Erich
Bernd Schubert wrote:
> Hello Erich,
>
> check out my bug report:
>
> https://bugzilla.lustre.org/show_bug.cgi?id=19992
>
> It was closed as duplicate of bug 16129, although that is probably not
> correct, as 16129 is the root cause, but not the solution.
>
> As we never observed it with 1.6.7.2 I didn't complain bug 19992 was closed.
> As you now can confirm it also happens with 1.6.7.2, please re-open that bug.
>
>
> Thanks,
> Bernd
>
> On Monday 19 April 2010, Erich Focht wrote:
>> Hi,
>>
>> we saw this LBUG 3 times within past week, and are puzzled of what's going
>> on, and how comes there's no bugzilla entry for this...
>>
>> What happens is that on an OSS a request (must be read or write) expects
>> (according to the content of the ioobj structure) to find an array of 22
>> struct niobuf_remote's (niocount), but only finds one. This is obviously
>> corrupted.
>>
>> We enabled checksumming where we could, but unfortunately the request
>> headers don't seem to be covered by any checksum check (well, the reply
>> path possibly is). Anyway, we see no corruption/checksum failures for bulk
>> data transfer, so it's improbable that this is a corruption on the wire,
>> that three times in a row says "size 16 too small (required X)" (with X
>> being 352, 432, 4016 in our failures).
>>
>> Did anybody see this? Any ideas or hints?
>>
>> We're using Lustre 1.6.7.2 on server and client side.
>>
>>
>> The LBUG traceback is:
>>
>> LustreError: 12946:0:(pack_generic.c:566:lustre_msg_buf_v2()) msg
>> ffff8101d0c4aad0 buffer[3] size 16 too small (required 352)
>> LustreError: 12946:0:(ost_handler.c:1594:ost_rw_hpreq_check()) ASSERTION(nb
>> != NULL) failed
>> LustreError: 12946:0:(ost_handler.c:1594:ost_rw_hpreq_check()) LBUG
>> Lustre: 12946:0:(linux-debug.c:222:libcfs_debug_dumpstack()) showing stack
>> for process 12946
>> ll_ost_io_135 R running task 0 12946 1 12947 12945
>> (L-TLB) ffffffff88574438 ffffffff88abb2e0 000000000000063a
>> ffff8101d0c4ac28 ffffffff88abb2e0 ffffffff88571c20 0000000000000000
>> 0000000000000000 ffffffff88574a35 ffffffff88abc7e2 0000000000000000
>> 0000000000000016 Call Trace:
>> [<ffffffff88571c20>] :libcfs:tracefile_init+0x0/0x110
>> [<ffffffff88aac641>] :ost:ost_rw_hpreq_check+0x1b1/0x290
>> [<ffffffff88ab9ebf>] :ost:ost_hpreq_handler+0x50f/0x7c0
>> [<ffffffff886d243b>] :ptlrpc:ptlrpc_main+0xebb/0x13e0
>> [<ffffffff8008a4aa>] default_wake_function+0x0/0xe
>> [<ffffffff800b4a6d>] audit_syscall_exit+0x327/0x342
>> [<ffffffff8005dfb1>] child_rip+0xa/0x11
>> [<ffffffff886d1580>] :ptlrpc:ptlrpc_main+0x0/0x13e0
>> [<ffffffff8005dfa7>] child_rip+0x0/0x11
>>
>>
>> Regards,
>> Erich
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>
>
More information about the lustre-discuss
mailing list