[Lustre-devel] o2iblnd bug ?

Liang Zhen Zhen.Liang at Sun.COM
Thu Jul 1 14:27:33 PDT 2010


Nic Henke wrote:
> There looks to be a bug in the o2iblnd (and maybe other LNDs...) in 
> kiblnd_tx_done.
>
> When tx_lntmsg[1] has a reply allocated (lnet_create_reply_msg) for a 
> GET_REQ, we are committed to lnet_finalize that no matter the status of 
> the RDMA. However, kiblnd_tx_done will call lnet_finalize() with the 
> 'error' status on both the request (lntmsg[0]) and the allocated reply. 
> This could lead to the upper layer receiving a REPLY event for a message 
> it has already nuked due to the EIO on the originial request.
>
>   

Nic,

I think lnet_create_reply_msg has already taken an extra reference on MD 
(lnet_create_reply_msg()->lnet_commit_md()), so the upper layer message 
shouldn't be nuked before the last event(unlinked).

Liang

> In the pttlnd and qswlnd, they seem to handle this properly. They will 
> complete the request with rc=0, then complete the reply with rc=-EIO.
>
> So - is this really a bug or just inconsequential differences ?
>
> This looks to be present in HEAD, as well as b1_8 and friends.
>
> Cheers,
> Nic
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>   




More information about the lustre-devel mailing list