[Lustre-devel] o2iblnd bug ?
Nic Henke
nic at cray.com
Thu Jul 1 09:18:43 PDT 2010
There looks to be a bug in the o2iblnd (and maybe other LNDs...) in
kiblnd_tx_done.
When tx_lntmsg[1] has a reply allocated (lnet_create_reply_msg) for a
GET_REQ, we are committed to lnet_finalize that no matter the status of
the RDMA. However, kiblnd_tx_done will call lnet_finalize() with the
'error' status on both the request (lntmsg[0]) and the allocated reply.
This could lead to the upper layer receiving a REPLY event for a message
it has already nuked due to the EIO on the originial request.
In the pttlnd and qswlnd, they seem to handle this properly. They will
complete the request with rc=0, then complete the reply with rc=-EIO.
So - is this really a bug or just inconsequential differences ?
This looks to be present in HEAD, as well as b1_8 and friends.
Cheers,
Nic
More information about the lustre-devel
mailing list