[Lustre-discuss] LBUG: ost_rw_hpreq_check() ASSERTION(nb != NULL) failed

Erich Focht efocht at hpce.nec.com
Mon Apr 19 10:14:24 PDT 2010


Hi,

we saw this LBUG 3 times within past week, and are puzzled of what's going on,
and how comes there's no bugzilla entry for this...

What happens is that on an OSS a request (must be read or write) expects
(according to the content of the ioobj structure) to find an array of 22 struct
niobuf_remote's (niocount), but only finds one. This is obviously corrupted.

We enabled checksumming where we could, but unfortunately the request headers
don't seem to be covered by any checksum check (well, the reply path possibly
is). Anyway, we see no corruption/checksum failures for bulk data transfer, so
it's improbable that this is a corruption on the wire, that three times in a row
says "size 16 too small (required X)"  (with X being 352, 432, 4016 in our
failures).

Did anybody see this? Any ideas or hints?

We're using Lustre 1.6.7.2 on server and client side.


The LBUG traceback is:

LustreError: 12946:0:(pack_generic.c:566:lustre_msg_buf_v2()) msg
ffff8101d0c4aad0 buffer[3] size 16 too small (required 352)
LustreError: 12946:0:(ost_handler.c:1594:ost_rw_hpreq_check()) ASSERTION(nb !=
NULL) failed
LustreError: 12946:0:(ost_handler.c:1594:ost_rw_hpreq_check()) LBUG
Lustre: 12946:0:(linux-debug.c:222:libcfs_debug_dumpstack()) showing stack for
process 12946
ll_ost_io_135 R  running task       0 12946      1         12947 12945 (L-TLB)
 ffffffff88574438 ffffffff88abb2e0 000000000000063a ffff8101d0c4ac28
 ffffffff88abb2e0 ffffffff88571c20 0000000000000000 0000000000000000
 ffffffff88574a35 ffffffff88abc7e2 0000000000000000 0000000000000016
Call Trace:
 [<ffffffff88571c20>] :libcfs:tracefile_init+0x0/0x110
 [<ffffffff88aac641>] :ost:ost_rw_hpreq_check+0x1b1/0x290
 [<ffffffff88ab9ebf>] :ost:ost_hpreq_handler+0x50f/0x7c0
 [<ffffffff886d243b>] :ptlrpc:ptlrpc_main+0xebb/0x13e0
 [<ffffffff8008a4aa>] default_wake_function+0x0/0xe
 [<ffffffff800b4a6d>] audit_syscall_exit+0x327/0x342
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff886d1580>] :ptlrpc:ptlrpc_main+0x0/0x13e0
 [<ffffffff8005dfa7>] child_rip+0x0/0x11


Regards,
Erich



More information about the lustre-discuss mailing list