[Lustre-discuss] Which NID to use?
cliff.white at intel.com
Fri Feb 28 13:20:58 PST 2014
On 2/28/14, 1:17 AM, "Chan Ching Yu Patrick" <cychan at clustertech.com>
>The reason why I made this setup is I'm not sure how Lustre selects the
>interface in mult-rail environment.
>Especially when all node have Infiniband and Ethernet, how can I ensure
>Infiniband is used between client and OSS?
The LNET Œnetworks¹ option is used to specify by interface. For example,
where your Infiniband interface is Œib0¹ you would
add this to your modprobe.conf or equivalent:
options lnet networks="o2ib0(ib0)²
That will define IB (the interface denoted by ib0 to be specific). Client
mounts using @o2ib0 NIDS will only use IB,regardless of other interfaces
See the Lustre manual for details on the LNET Œnetworks¹ option.
In your case, I would suspect that the two TCP/IP interfaces are
equivalent in TCP/IP routing terms, perhaps on the same segment.
When that happens TCP/IP routing is taking over. Basically, you can
control which interface you send from, but if the receiver sees two equal
TCP/IP paths back, you can¹t control which path it chooses to take. Has
nothing to do with LNET or Lustre.
In the case where the network hardware is dissimilar, you don¹t have this
problem. Connections starting on IB stay on IB.
If you only have one IB network, using the IB NID will ensure all clients
use only IB.
>On 02/27/2014 12:28 PM, Mohr Jr, Richard Frank (Rick Mohr) wrote:
>> On Feb 26, 2014, at 7:14 PM, "Chan Ching Yu,
>>Patrick"<cychan at clustertech.com>
>>> [root at mds1 ~]# lctl list_nids
>>> 192.168.122.240 at tcp
>>> 192.168.100.100 at tcp1
>>> [root at oss1 ~]# lctl list_nids
>>> 192.168.122.194 at tcp
>>> 192.168.100.101 at tcp1
>>> [root at client ~]# lctl list_nids
>>> 192.168.122.70 at tcp
>>> 192.168.100.102 at tcp1
>>> On Lustre client, I intentionally mount it with tcp1
>>> [root at client ~]# mount | grep lustre
>>> 192.168.100.100 at tcp1:/data on /lustre type lustre (rw)
>>> Now I dd a file on Lustre filesystem, you can see that tcp0 is used
>>>when writing on OST.
>> I am not an expert on the inner workings of lustre, but as far as I
>>understand it, when oss1 connects to the mgs, it will report the nids it
>>has available. When the client connects to mgs to get info about the
>>oss1 server, it will receive a list of all the oss1 nids. The client
>>then steps through that list and compares the oss1 nids with its local
>>nids to find a match (i.e. - nids that are on the same lnet network).
>>If it matches tcp0 first, then that is the connection it uses. The lnet
>>network used to connect to the mgs is irrelevant at that point.
>>However, I do not know if there are any guarantees about the ordering of
>>the nids that the mgs will report (ie - will tcp0 always be the first
>> If there is an error in my description, hopefully a lustre developer
>>will point out the flaw.
>> It is not clear what you are trying to accomplish with this multi rail
>>setup. Are you trying to force mds traffic over one client link and oss
>>traffic over the other? Or are you trying to utilize both links
>>simultaneously for all traffic?
>Lustre-discuss mailing list
>Lustre-discuss at lists.lustre.org
More information about the lustre-discuss