[lustre-devel] [EXTERNAL] lnet selftest using large NIDs (16 byte)
simmonsja at ornl.gov
Thu Nov 3 10:53:13 PDT 2022
>>>I am working on a PoC for a new LND which need to use a 16 bytes NID address
>>>I am currently facing issues adding a 16byte NID to Lnet selftest since it only handles 4 byte NIDs
>>>Are there any patches or WIP to add 16 byte NID support to LST ?
>>Yes, there is but it’s under current development. To try it out you need the latest Lustre code plus a bunch of patches.
>>You can see where we are at this link hxxps://jira.whamcloud.com/browse/LU-10391.
>>Since going through the tickets is going to be a lot, I can give you a quick summary. The basic infrastructure is in the
>>core LNet code but the big changes needed are the wire protocol headers and user land interface tools. Note having
>>Lustre using large NIDS is another set of tickets which are not there yet.
>Still learning my way through Lustre 😊.
>you mean all the required work is covered by WIP patches or some stuff are still not coded ?
Yes and yes. We have outstanding patches that are nearing landing and there is more work to be done that
hasn't been written.
>> It doesn’t sound like you are looking for a functional file system on top of your interconnect at this point.
>you are right. I am mostly trying to see the BW potential using Lnet selftest.
>I am currently hacking the all addressing thing but long term will probably need the large NID solution.
Its more complicated than you think. A lot of work went into Lustre 2.13 and 2.14 to get to the stage we are at.
For your sanity I recommend that you move to our work.
>At this point the goal will be to get lnet selftest to do ping test over the wire between two large NID. If
>you are interested in this work let me know. It would be great if you can be an early tester. It would be
>nice to get feedback on this work.
>I would be glad to try it. It might take me a while because I'm currently based on 2.12 and rebasing might be a pain.
>But I'll definitely make some time for that as soon as my LND code stabilizes.
There is no support for this work in 2.12. You are going to have to move to a newer lustre version. LND forward
porting shouldn't be to painful. The biggest change is the LNet health work but it's not a hard requirement.
What will need to be changed is support for the large nid support. By that I mean moving from
lnet_nid_t -> struct lnet_nid
The port will be easier than you think.
>We have a slack channel where we have discussions on the progress
>of this work. You will have questions about the changes needed to properly support your LND driver the
>slack channel is the best place to ask those. Feel free to ask here as well if you prefer. Someone will
>answer. Let me know if you want to join the slack channel.
Sure, I'll be happy to join your slack channel.
Thanks for the all info and the slack invite !
More information about the lustre-devel