[Lustre-devel] extend lnet_notify to public LNet API

Nic Henke nic at cray.com
Wed Nov 17 06:59:43 PST 2010


On 11/17/2010 01:52 AM, Alexey Lyashkov wrote:
> Nic,
>
> that idea discussed some time ago (as i remember with green and maxim), but have some objection.
> Currently LNet hide from ptlrpc layer any network flaps, and LNet will resend request without notify ptlrpc about flap until ptlrpc request timeout.

I'm missing something - to my knowledge, LNet never retries messages.

> But if ptlrpc will see node down event, ptlrpc will try reconnect  - that will produce extra overhead, because need to resend too much requests from sending and delay lists instead of lots requests in network flap time.
> So, you need separate network flap from node down situation - before implementing that.
> currently node marked down if node don't respond for request in ptlrpc timeout, which include network transmit and processing times, but it different then LNet message timeout.

I think that is a valid upper layer decision to make, but separate from 
implementing the LNet callbacks on network 'flap'. I wouldn't want to 
force ptlrpc to use it.

Cheers,
Nic



More information about the lustre-devel mailing list