[lustre-devel] lnet selftest using large NIDs (16 byte)

Yitschak, Yehuda yehuday at amazon.com
Thu Nov 3 08:57:12 PDT 2022


>>Hello 
>>
>>I am working on a PoC for a new LND which need to use a 16 bytes NID address
>>I am currently facing issues adding a 16byte NID to Lnet selftest since it only handles 4 byte NIDs
>>
>>Are there any patches or WIP to add 16 byte NID support to LST ? 
>Yes, there is but it’s under current development. To try it out you need the latest Lustre code plus a bunch of patches.
>You can see where we are at this link https://jira.whamcloud.com/browse/LU-10391.
>
>Since going through the tickets is going to be a lot, I can give you a quick summary. The basic infrastructure is in the
>core LNet code but the big changes needed are the wire protocol headers and user land interface tools. Note having
>Lustre using large NIDS is another set of tickets which are not there yet.

Still learning my way through Lustre 😊. 
you mean all the required work is covered by WIP patches or some stuff are still not coded ?

> It doesn’t sound like you are looking for a functional file system on top of your interconnect at this point.

you are right. I am mostly trying to see the BW potential using Lnet selftest.
I am currently hacking the all addressing thing but long term will probably need the large NID solution.

>
>For the user land tools we need to update them to support large NID addressing.  The main functionality we need
>is support of setup of the local NI, peers, and pings. We do need routers as well but it’s not a hard requirement at
>this point. A patch to support large NID for local NID is in the master-next branch so if our gate keeper is happy
>It will land in the coming week. The patch is at
>
>https://review.whamcloud.com/c/fs/lustre-release/+/48814
>
>With this patch you can run lctl list_nids and see that large NIDs you setup. Note I haven’t finish lnetctl net show
>support since it gives more in-depth info compared to lctl list_nids. I have a unfinished patch for that work. I also
>have a lctl ping / lnetctl ping patch to support large NIDs in the work. It has a few bugs I need to work out but its
>somewhat working. LNet selftest also needs to be reworked to support large NIDs. I have a patch to start this
>support.
>
>https://review.whamcloud.com/c/fs/lustre-release/+/43298
>
>I also have a local patch for lnet selftest group handling that is not finished. With the ability to set up local NI
>we can then allow selftest group setup. 
>
>For the wire protocol we need to support pings and transfers i.e PUT, GET etc. Ping has been heavily worked
>on and I have been testing it with my incomplete large NID ping tool update. The patch series is here:
>
>https://review.whamcloud.com/c/fs/lustre-release/+/44635
>
>You will see in gerrit the patch set needed to get pings working. The rest of LNet data transfer protocol
>will require setting up the proper wire header. The new wire headers already exist but are not sent over
>the wire at this point. 
>
>At this point the goal will be to get lnet selftest to do ping test over the wire between two large NID. If
>you are interested in this work let me know. It would be great if you can be an early tester. It would be
>nice to get feedback on this work.

I would be glad to try it. It might take me a while because I'm currently based on 2.12 and rebasing might be a pain.
But I'll definitely make some time for that as soon as my LND code stabilizes.  

>We have a slack channel where we have discussions on the progress
>of this work. You will have questions about the changes needed to properly support your LND driver the 
>slack channel is the best place to ask those. Feel free to ask here as well if you prefer. Someone will 
>answer. Let me know if you want to join the slack channel.

Sure, I'll be happy to join your slack channel.
Thanks for the all info and the slack invite ! 

Yehuda 


More information about the lustre-devel mailing list