[lustre-devel] [PATCH 00/24] lustre - more cleanups including module reduction.

Andreas Dilger adilger at whamcloud.com
Thu Jun 28 16:12:42 PDT 2018


On Jun 27, 2018, at 20:35, Patrick Farrell <paf at cray.com> wrote:
> 
> I would say about the acceptor and sock lnd: I believe Lustre assumes some IP transport is available for configuration, but does NOT necessarily use it for primary communication.  Fabrics - Like infiniband or Cray Aries - are more or less always configured to provide IP transport, to enable the panoply of tools and apps that rely on it.  But they perform better if their native protocols are used, which is of course what the other LNDs do.

It is worthwhile to clarify this a bit - LNet uses IP for *addressing* of the nodes at connection time, but I don't think it even uses the TCP interface for any communication itself (though I could be mistaken, as LNet isn't my specialty).  After the initial connection, the only other thing the IP addresses are used for is printing in error messages.

At one time or another we've discussed how we might get rid of the need for IPoIB on client nodes, since some sites don't want any IP connectivity to the client nodes for security and performance reasons.  That said, I don't think we've come up with a good solution yet.  LNet itself allows alternate addressing schemes to be used.  The former qswlnd (for Quadrics Elan networks) just used an integer node number, like 1 at elan for the NID, but I don't think there is any such alternative for IB node addresses except using a MAC hardware address or similar.

Cheers, Andreas

> 
> Neil Brown <neilb at suse.com> wrote:
>> 
>> On Wed, Jun 27 2018, Patrick Farrell wrote:
>> 
>> > Neil,
>> >
>> > We do indeed have such functionality (it’s called DVS and it’s
>> > basically a high speed file system projection framework, ala NFS but
>> > faster), so the ability to build lnet separately is valuable to us.
>> > While it is being open sourced under the GPL, I don’t think there’s
>> > any intention to try to upstream it.  The current code isn’t even
>> > usable off of Cray systems as it depends on info from user space (that
>> > is provided, in the end, from Cray proprietary hardware) to keep its
>> > connection/routing tables up to date.  That’s supposedly in the
>> > pipeline to get fixed, but it’s still pretty far from generally
>> > usable.
>> >
>> > But we’d still really appreciate it if lnet stayed separate.  Don’t
>> > know if that’s enough for you - I know sometimes *small* stuff is done
>> > for out of tree users.  Hopefully this meets that standard.
>> >
>> 
>> Ahh - DVS.  That answers a question I just asked in another email.
>> My google-skills don't seem to be up to locating the source code though
>> :-(
>> 
>> While I wouldn't knowingly break an interface used by some out-of-tree
>> code without good reason, it is hard to avoid if you don't know what the
>> out-of-tree code does.  It can be very tempting to remove something that
>> isn't being used, but that can certainly hurt out-of-tree code
>> sometimes.
>> 
>> A particular example I'm exploring at present is the dual data paths in
>> LNet.  Or maybe it is dual types of Memory Descriptors.
>> There is 'kiov' which uses kernel-virtual addresses and 'iovec' which
>> uses page+offset.
>> The kiov option isn't used in the client code and it seems likely that
>> the server-side code could be converted to use iovec without problems.
>> 
>> I'd like to remove the kiov as I wouldn't be able to justify its
>> existence when submitting the client-only code upstream.  But I don't
>> want to remove the option of having an alternate MD type if it really is
>> significantly more efficient in some context.
>> If I know whether DVS used kiov or iovec - and in what way - that would
>> help me to know if I might break something, and to be able to assess the
>> cost.
>> 
>> In my mind, the "standard" that you mention is always about
>> practicality.   Code needs to be maintainable - easy to understand and
>> hard to break.  If the LNet interface is clean and well documented in
>> the kernel, then I don't see why we would not at least attempt to
>> preserve it.
>> 
>> Thanks,
>> NeilBrown

Cheers, Andreas
---
Andreas Dilger
Principal Lustre Architect
Whamcloud







-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180628/9e2591e1/attachment.sig>


More information about the lustre-devel mailing list