[lustre-discuss] Multiple IB Interfaces
a.g.basden at durham.ac.uk
Fri Mar 12 01:32:03 PST 2021
Thanks for the replies. The issue as I see it is with sending data from
an OST to the client, avoiding the inter-CPU link.
So, if I have:
cpu1 - IB card 1 (10.0.0.1), nvme1 (OST1)
cpu2 - IB card 2 (10.0.0.2), nvme2 (OST2)
Both IB cards on the same subnet. Therefore, by default, packets will be
routed out of the server over the preferred card, say IB card 1 (I could
be wrong, but this is my current understanding, and seems to be what the
Lustre manual says).
Data coming in (being written to the OST) is not a problem. The client
will know the IP address of the card to which the OST is closest. So,
to write to OST2, it will use the 10.0.0.2 address (since this will be
the IP address given in mkfs.lustre for that OST).
The slight complication here is pinning. A cpu thread may run on cpu1, so
the data has to traverse the inter-cpu link twice. However, I am assuming
that this won't happen - i.e. the kernel or lustre are clever enough to
place this thread on cpu2. As far as I am aware, this should just work,
though please correct me if I'm wrong. Perhaps I have to manually specify
pinning - how does one do that with Lustre?
Reading is more problematic. A request from a client (say 10.0.0.100) for
data on OST2 will come in via card 2 (10.0.0.2). A thread on CPU2
(hopefully) will then read the data from OST2, and send it out to the
client, 10.0.0.100. However, here, Linux will route the packet through
the first card on this subnet, so it will go over the inter-cpu link, and
out of IB card 1. And this will be the case even if the thread is pinned
The question then is whether there is a way to configure Lustre to use IB
card 2 when sending out data from OST2.
On Wed, 10 Mar 2021, Ms. Megan Larko wrote:
> [EXTERNAL EMAIL]
> Greetings Alastair,
> Bonding is supported on InfiniBand, but I believe that it is only active/passive.
> I think what you might be looking for WRT avoiding data travel through the inter-cpu link is cpu "affinity" AKA cpu "pinning".
> WRT = "with regards to"
> AKA = "also known as"
More information about the lustre-discuss