[lustre-discuss] Multiple MGS interfaces config
hecht at hlrs.de
Thu Sep 24 01:29:17 PDT 2015
On 09/23/2015 02:39 AM, Exec Unerd wrote:
> My environment has both TCP and IB clients, so my Lustre config has to
> accommodate both, but I'm having a hard time figuring out the proper syntax
> for it. Theoretically, I should be able to use comma-separated interfaces
> in the mgsnode parameter like this:
> --mgsnode=192.168.10.1 at tcp0,172.16.10.1 at o2ib
> --mgsnode=192.168.10.2 at tcp0,172.16.10.2 at o2ib
I think this should work:
--mgsnode=192.168.10.1 at tcp0 --mgsnode=172.16.10.1 at o2ib
--mgsnode=192.168.10.2 at tcp0 --mgsnode=172.16.10.2 at o2ib
at least that's how it works with a multirail ib network (where you would replace tcp0 by o2ib1).
The mount command would contain all 4 nids, but if the client can't connect via tcp it takes until it reaches a timeout and tries the next one. If in addition the MGS is failed over to the second server I guess it takes three timeouts until the client succeeds to connect.
> The problem is, this doesn't work for all clients all the time ...
> randomly. It would work, then it wouldn't. Googling, I found some known
> defects saying that the comma delimiter didn't work as per the manual and
> recommending alternate syntaxes like using the colon instead of a comma. I
> know what the manuals *say*about the syntax, I'm just having trouble
> getting it to work.
I'm not sure if I have understood your setup correctly. You have ib
clients and you have other hosts which are connected via tcp, right? Or
do the clients have both, and the tcp network a failback solution in
case the ib doesn't work properly (network flooded, SM crashed or alike)?
When you say it doesn't work on a particular client, can you lctl ping
one of the nids in this situation? Or can you ping the other direction
from the server to the client? And if at least one of the pings
succeeds, can you suddenly mount afterwards?
> This seems to affect only the TCP clients; at least I haven't seen it
> affect any of the IB clients. It may be a comma parsing problem or
> something else.
> I have two questions for the group:
> 1. Is there a known-working method for using both TCP and IB interface
> NIDs for the MGS in this manner?
> 2. What's the best way to trace the TCP client interactions to see where
> it's breaking down?
> Versions in use:
> kernel: 2.6.32-504.23.4.el6.x86_64
> lustre: lustre-2.7.58-2.6.32_504.23.4.el6.x86_64_g051c25b.x86_64
> zfs: zfs-0.6.4-76_g87abfcb.el6.x86_64
> My lustre.conf contents:
> options lnet networks="o2ib0(ib1),tcp0(ixgbe1)"
ip2nets could be an alternative here, especially if not all clients have
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 2252 bytes
Desc: S/MIME Cryptographic Signature
More information about the lustre-discuss