[Lustre-discuss] LBUG: ost_rw_hpreq_check() ASSERTION(nb != NULL) failed

Bernd Schubert bs_lists at aakef.fastmail.fm
Mon Apr 19 11:09:27 PDT 2010


Hello Erich,

check out my bug report:

https://bugzilla.lustre.org/show_bug.cgi?id=19992

It was closed as duplicate of bug 16129, although that is probably not 
correct, as 16129 is the root cause, but not the solution.

As we never observed it with 1.6.7.2 I didn't complain bug 19992 was closed. 
As you now can confirm it also happens with 1.6.7.2, please re-open that bug.


Thanks,
Bernd

On Monday 19 April 2010, Erich Focht wrote:
> Hi,
> 
> we saw this LBUG 3 times within past week, and are puzzled of what's going
>  on, and how comes there's no bugzilla entry for this...
> 
> What happens is that on an OSS a request (must be read or write) expects
> (according to the content of the ioobj structure) to find an array of 22
>  struct niobuf_remote's (niocount), but only finds one. This is obviously
>  corrupted.
> 
> We enabled checksumming where we could, but unfortunately the request
>  headers don't seem to be covered by any checksum check (well, the reply
>  path possibly is). Anyway, we see no corruption/checksum failures for bulk
>  data transfer, so it's improbable that this is a corruption on the wire,
>  that three times in a row says "size 16 too small (required X)"  (with X
>  being 352, 432, 4016 in our failures).
> 
> Did anybody see this? Any ideas or hints?
> 
> We're using Lustre 1.6.7.2 on server and client side.
> 
> 
> The LBUG traceback is:
> 
> LustreError: 12946:0:(pack_generic.c:566:lustre_msg_buf_v2()) msg
> ffff8101d0c4aad0 buffer[3] size 16 too small (required 352)
> LustreError: 12946:0:(ost_handler.c:1594:ost_rw_hpreq_check()) ASSERTION(nb
>  != NULL) failed
> LustreError: 12946:0:(ost_handler.c:1594:ost_rw_hpreq_check()) LBUG
> Lustre: 12946:0:(linux-debug.c:222:libcfs_debug_dumpstack()) showing stack
>  for process 12946
> ll_ost_io_135 R  running task       0 12946      1         12947 12945
>  (L-TLB) ffffffff88574438 ffffffff88abb2e0 000000000000063a
>  ffff8101d0c4ac28 ffffffff88abb2e0 ffffffff88571c20 0000000000000000
>  0000000000000000 ffffffff88574a35 ffffffff88abc7e2 0000000000000000
>  0000000000000016 Call Trace:
>  [<ffffffff88571c20>] :libcfs:tracefile_init+0x0/0x110
>  [<ffffffff88aac641>] :ost:ost_rw_hpreq_check+0x1b1/0x290
>  [<ffffffff88ab9ebf>] :ost:ost_hpreq_handler+0x50f/0x7c0
>  [<ffffffff886d243b>] :ptlrpc:ptlrpc_main+0xebb/0x13e0
>  [<ffffffff8008a4aa>] default_wake_function+0x0/0xe
>  [<ffffffff800b4a6d>] audit_syscall_exit+0x327/0x342
>  [<ffffffff8005dfb1>] child_rip+0xa/0x11
>  [<ffffffff886d1580>] :ptlrpc:ptlrpc_main+0x0/0x13e0
>  [<ffffffff8005dfa7>] child_rip+0x0/0x11
> 
> 
> Regards,
> Erich
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 


-- 
Bernd Schubert
DataDirect Networks



More information about the lustre-discuss mailing list