[Lustre-devel] lnet NAT friendliness

Nicolas Williams Nicolas.Williams at oracle.com
Wed May 5 09:32:45 PDT 2010

On Wed, May 05, 2010 at 12:13:56PM -0400, Ken Hornstein wrote:
> >> >I would think using VPN from outside into your Lustre-supplying LAN should
> >> >be enough to work around this problem somewhat easily with no code changes.
> >
> >There's another option: make the gateway an LNet router.
> Did you see my previous message about this?  That simply isn't an option
> in many cases.

Yes, I did, but I was just adding a workaround that might work for
others (it might not -- haven't tested it).

> >I wouldn't say that's our "official" position.  For starters, you could
> >file an RFE.  You could also contribute a fix.  But it won't be simple
> >to fix.
> Did you see my original message about this?  A simple fix (which I will
> fully admit I only did an extremely brief amount of testing on) was
> only six lines of changes.  Sure, it's not appropriate as general
> changes to LNet, but I think making it configurable would be perfectly
> reasonable.  But I wrote the code, so I will fully admit that I'm biased
> about it.

I did see that.  I hadn't followed it in detail, but just now I looked
at the code you mentioned, and, on a pure client I think that makes
sense.  See below.

>                                 [...].  But it seems the feedback I'm
> getting from the people at Oracle is, "Meh, don't bother".

Well, we (or our customers) might have no use for it at this time; or
perhaps it's just NAT hatred running in our veins (just kidding, though
I suspect most people who've come in contact with NAT love/hate it).
Doesn't mean we wouldn't take patches, or that we'd never have a use for
it.  But the first priority is to make sure that the fix, if you'll
contribute one, is sufficiently robust.  See below.

> >The fix, if it's at all possible, would require that clients's socklnds
> >try to keep TCP connections open at all times to all nodes that the
> >client has spoken to in the past.  That's pretty heavy-weight.
> Actually, I will freely confess to not being the LNet expert ... but
> are socklnd TCP connections closed now when clients are idle?  With the
> pinger running (which is a requirement, from what I understand), it seems
> like you'd have a TCP connection going all of the time beween all clients
> and servers.  The pinger sends a packet every 20-25 seconds, right?

Perhaps my "that's pretty heavy-weight" comment was off the mark.
However, I know very little about socklnd, and the key is to make sure
it proactively re-connects in the face of timeouts so that servers can
always send messages to the NATted clients.


More information about the lustre-devel mailing list