[Lustre-devel] extend lnet_notify to public LNet API
Nic Henke
nic at cray.com
Tue Nov 16 08:00:42 PST 2010
We'd like to allow upper layers (Lustre, Cray DVS, etc) to register a
callback that would be called from lnet_notify. This will allow them to
be notified when the lower layers have seen network problems between
NIDs and let them take appropriate action. The upper layer could also be
notified when that peer has returned to 'network health' after the LND
gets its act together.
This would help allow upper layers to aggressively resend/reconnect in
the cases where all TX have completed successfully (meaning no LNet -EIO
on LND errors) but there are LNET_MSG_ACK or other REPLY traffic
outstanding.
Initial proposal is on the verbose side, giving all data that
lnet_notify sees:
- lnet_nid_t
- is_alive (boolean)
- cfs_time_t when (unsigned long on Linux) - jiffies when last alive
Is this workable and likely to be accepted up-stream ?
Nic
More information about the lustre-devel
mailing list