[lustre-devel] lnet selftest using large NIDs (16 byte)

Simmons, James simmonsja at ornl.gov
Thu Nov 3 07:23:30 PDT 2022


>Hello
>
>I am working on a PoC for a new LND which need to use a 16 bytes NID address
>I am currently facing issues adding a 16byte NID to Lnet selftest since it only handles 4 byte NIDs
>
>Are there any patches or WIP to add 16 byte NID support to LST ?

Yes, there is but it’s under current development. To try it out you need the latest Lustre code plus a bunch of patches.
You can see where we are at this link https://jira.whamcloud.com/browse/LU-10391.

Since going through the tickets is going to be a lot, I can give you a quick summary. The basic infrastructure is in the
core LNet code but the big changes needed are the wire protocol headers and user land interface tools. Note having
Lustre using large NIDS is another set of tickets which are not there yet. It doesn’t sound like you are looking for
a functional file system on top of your interconnect at this point.

For the user land tools we need to update them to support large NID addressing.  The main functionality we need
is support of setup of the local NI, peers, and pings. We do need routers as well but it’s not a hard requirement at
this point. A patch to support large NID for local NID is in the master-next branch so if our gate keeper is happy
it will land in the coming week. The patch is at

https://review.whamcloud.com/c/fs/lustre-release/+/48814

With this patch you can run lctl list_nids and see that large NIDs you setup. Note I haven’t finish lnetctl net show
support since it gives more in-depth info compared to lctl list_nids. I have a unfinished patch for that work. I also
have a lctl ping / lnetctl ping patch to support large NIDs in the work. It has a few bugs I need to work out but its
somewhat working. LNet selftest also needs to be reworked to support large NIDs. I have a patch to start this
support.

https://review.whamcloud.com/c/fs/lustre-release/+/43298

I also have a local patch for lnet selftest group handling that is not finished. With the ability to set up local NI
we can then allow selftest group setup.

For the wire protocol we need to support pings and transfers i.e PUT, GET etc. Ping has been heavily worked
on and I have been testing it with my incomplete large NID ping tool update. The patch series is here:

https://review.whamcloud.com/c/fs/lustre-release/+/44635

You will see in gerrit the patch set needed to get pings working. The rest of LNet data transfer protocol
will require setting up the proper wire header. The new wire headers already exist but are not sent over
the wire at this point.

At this point the goal will be to get lnet selftest to do ping test over the wire between two large NID. If
you are interested in this work let me know. It would be great if you can be an early tester. It would be
nice to get feedback on this work. We have a slack channel where we have discussions on the progress
of this work. You will have questions about the changes needed to properly support your LND driver the
slack channel is the best place to ask those. Feel free to ask here as well if you prefer. Someone will
answer. Let me know if you want to join the slack channel.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20221103/d2363e8a/attachment.htm>


More information about the lustre-devel mailing list