[Lustre-devel] extend lnet_notify to public LNet API

Nic Henke nic at cray.com
Tue Nov 16 08:00:42 PST 2010


We'd like to allow upper layers (Lustre, Cray DVS, etc) to register a 
callback that would be called from lnet_notify. This will allow them to 
be notified when the lower layers have seen network problems between 
NIDs and let them take appropriate action. The upper layer could also be 
notified when that peer has returned to 'network health' after the LND 
gets its act together.

This would help allow upper layers to aggressively resend/reconnect in 
the cases where all TX have completed successfully (meaning no LNet -EIO 
on LND errors) but there are LNET_MSG_ACK or other REPLY traffic 
outstanding.

Initial proposal is on the verbose side, giving all data that 
lnet_notify sees:
- lnet_nid_t
- is_alive (boolean)
- cfs_time_t when (unsigned long on Linux) - jiffies when last alive

Is this workable and likely to be accepted up-stream ?

Nic



More information about the lustre-devel mailing list