[lustre-discuss] Multiple MGS interfaces config

Chris Hunter chris.hunter at yale.edu
Thu Sep 24 08:33:19 PDT 2015


 > My environment has both TCP and IB clients, so my Lustre config has to
 > accommodate both, but I'm having a hard time figuring out the proper 
syntax
 > for it. Theoretically, I should be able to use comma-separated interfaces
 > in the mgsnode parameter like this:
 >
 > --mgsnode=192.168.10.1 at tcp0,172.16.10.1 at o2ib
 > --mgsnode=192.168.10.2 at tcp0,172.16.10.2 at o2ib
 >
 > The problem is, this doesn't work for all clients all the time ...
 > randomly. It would work, then it wouldn't. Googling, I found some known
 > defects saying that the comma delimiter didn't work as per the manual and
 > recommending alternate syntaxes like using the colon instead of a 
comma. I
 > know what the manuals *say*about the syntax, I'm just having trouble
 > getting it to work.
 >
 > This seems to affect only the TCP clients; at least I haven't seen it
 > affect any of the IB clients. It may be a comma parsing problem or
 > something else.
 >
 > I have two questions for the group:
 >
 >    1. Is there a known-working method for using both TCP and IB interface
 >    NIDs for the MGS in this manner?

I used quotes with comma-delimited listing when formatting osts eg)
mkfs.lustre --verbose
--ost --index=0 --fsname="testfs" 
--mgsnode="172.16.10.1 at o2ib0,192.168.10.1 at tcp0" <OST_DEV>

When mounting on a multi-homed client, you can use both mgs addresses to 
give some failover support:

mount -v -t lustre 172.16.10.1 at o2ib0,192.168.10.1 at tcp0:/testfs /mnt/testfs

FYI, I also have dual-home OSS servers, so I also use comma-delimited 
list for the --servicenode parameter in mkfs.lustre.

>    2. What's the best way to trace the TCP client interactions to see where
>    it's breaking down?
If lnet is running on the client, you can try "lctl ping"
eg) lctl ping 172.16.10.1 at o2ib

I believe a lustre mount uses ipoib for initial handshake with a mds 
o2ib interfaces. You should make sure regular ping over ipoib is working 
before mounting lustre.

> Versions in use:
> kernel: 2.6.32-504.23.4.el6.x86_64
> lustre: lustre-2.7.58-2.6.32_504.23.4.el6.x86_64_g051c25b.x86_64
> zfs: zfs-0.6.4-76_g87abfcb.el6.x86_64
>
> My lustre.conf contents:
> options lnet networks="o2ib0(ib1),tcp0(ixgbe1)"

chris hunter
chris.hunter at yale.edu



More information about the lustre-discuss mailing list