[Lustre-devel] o2iblnd bug ?

Nic Henke nic at cray.com
Thu Jul 1 09:18:43 PDT 2010


There looks to be a bug in the o2iblnd (and maybe other LNDs...) in 
kiblnd_tx_done.

When tx_lntmsg[1] has a reply allocated (lnet_create_reply_msg) for a 
GET_REQ, we are committed to lnet_finalize that no matter the status of 
the RDMA. However, kiblnd_tx_done will call lnet_finalize() with the 
'error' status on both the request (lntmsg[0]) and the allocated reply. 
This could lead to the upper layer receiving a REPLY event for a message 
it has already nuked due to the EIO on the originial request.

In the pttlnd and qswlnd, they seem to handle this properly. They will 
complete the request with rc=0, then complete the reply with rc=-EIO.

So - is this really a bug or just inconsequential differences ?

This looks to be present in HEAD, as well as b1_8 and friends.

Cheers,
Nic



More information about the lustre-devel mailing list