[Lustre-devel] extend lnet_notify to public LNet API

Nic Henke nic at cray.com
Mon Nov 22 09:23:36 PST 2010


On 11/17/2010 08:59 AM, Nic Henke wrote:
> On 11/17/2010 01:52 AM, Alexey Lyashkov wrote:
>> Nic,
>>
>> that idea discussed some time ago (as i remember with green and maxim), but have some objection.
>> Currently LNet hide from ptlrpc layer any network flaps, and LNet will resend request without notify ptlrpc about flap until ptlrpc request timeout.
>
> I'm missing something - to my knowledge, LNet never retries messages.
>
>> But if ptlrpc will see node down event, ptlrpc will try reconnect  - that will produce extra overhead, because need to resend too much requests from sending and delay lists instead of lots requests in network flap time.
>> So, you need separate network flap from node down situation - before implementing that.
>> currently node marked down if node don't respond for request in ptlrpc timeout, which include network transmit and processing times, but it different then LNet message timeout.
>
> I think that is a valid upper layer decision to make, but separate from
> implementing the LNet callbacks on network 'flap'. I wouldn't want to
> force ptlrpc to use it.

Any response to this ?

Nic



More information about the lustre-devel mailing list