[Lustre-devel] replacing Lustre pings with LNet Peer Health
Nic Henke
nic at cray.com
Tue May 17 07:30:08 PDT 2011
On 05/12/2011 12:37 PM, Christopher J. Morrone wrote:
> I think Eric's approach is the only sane way I've heard to reduce pings.
>
> Here are some issues that I see with this:
>
> 1) For your solution to work, you require that the lnet layer take on
> pinging duties. Usually the network, be it IB, TCP, whatever, will not
> provide any active notification of a peer failure. To notice that a
> peer has died, the lnet LND must, you guessed it, ping.
>
Correct. I had assumed the LNDs would or could be doing the pinging. At
worst it'd be done on a per-peer basis and not per-import, reducing the
traffic somewhat. It'd also reduce the number of layers that need to be
involved in the message RX, providing some CPU usage benefit.
> Usually the LNDs try to be smart. They only generate their own pings if
> no traffic has been sent to the peer in a certain period of time. So
> once you eliminate the higher-level pings, they will partly be replaced
> by lower-level pings.
Correct, and I thought that sufficient to provide reasonable notification.
Given the LNet router case, I think this idea is a bit DOA... unless I
find some sort of non-gross magic :-)
Cheers,
Nic
More information about the lustre-devel
mailing list