[Lustre-discuss] Network name o2ib0 collision in two discrete filesystems

Mohr Jr, Richard Frank (Rick Mohr) rmohr at utk.edu
Tue Sep 9 07:24:42 PDT 2014


I don't think that lnet routers will help you here.  Lnet routers are meant to route traffic between different lustre networks, but you have the same network names on both sides.  The best thing to do would probably be to change the o2ibX and tcpX lnet names to be different at both sites.  However, if you didn't want to do that, you might be able to get away with just changing the "tcpX" names to be distinct.  You could use o2ib0/tcp1 for FS1 and o2ib0/tcp2 for FS2.  The clients at site #1 would be configured with:

options network=tcp2(eth0), o2ib0(ib0)

Based on the results of your testing, the client should prefer tcp2 over o2ib0.  When the client at site#1 mounts FS1, the only network in common would be o2ib0.  When the client mounts FS2, it would have both tcp2 and o2ib0 in common but hopefully it will prefer tcp2 since it appears first in the modprobe.conf file.  I don't really know if this would work, and even if it does, it might not be a very robust solution if the client logic for choosing networks ever changes.

-- 
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu


On Sep 9, 2014, at 7:04 AM, James Robnett <jrobnett at aoc.nrao.edu>
 wrote:

> 
> I'm having difficulty figuring out a solution to an LNET issue I'm having.
> 
> We have two Lustre filesystems separated by about 60 miles, both of which have o2ib0(ib0) and tcp(eth0) networks defined.  Both have IB and TCP clients which work just fine.
> 
> I'll call them FS1 and FS2.
> 
> FS1-mds at ib0  192.168.1.11
> FS1-mds at eth0 10.1.1.11
> 
> FS2-mds at ib0  192.168.2.11
> FS2-mds at eth0 10.1.2.11
> 
> We have a need for a client physically at site-1 to mount the filesystems from both sites.  The intent is to mount the local FS1 via IB0 and the remote FS2 via TCP0 (accessible over gbit).
> 
> The mount commands for the client are:
> mount −t lustre 192.168.1.11 at o2ib0:/lustre /lustre/FS1
> mount −t lustre 10.1.2.11 at tcp0:/lustre /lustre/FS2
> 
> If I set this client's modprobe.conf line as
> 
> options network=o2ib0(ib0), tcp0(eth0)
> 
> then it mounts FS1 without issue but then fails on FS2 since it tries to communicate via o2ib0 despite the mount command specifying tcp0. Presumably since the client asserts it knows about both o2ib0 and tcp0 without realizing o2ib0 at site1 is functionally different from o2ib0 at site2.
> 
> If I set the client's modprobe.conf line as
> options network=tcp0(eth0), o2ib0(ib0)
> 
> then it mounts FS1 just fine but actually communicates via TCP0 (visible through /proc/sys/lnet/peers) since there's a network path that works and it's first in the list.  It also mounts FS2 just fine as expected.
> 
> So I can mount on or the other but not both or at least not both in the way that we need (i.e. IB for site1 and TCP for site2).
> 
> I'd begun looking into setting up an LNET router at site2 but I'm suspicious that won't actually help or it will help but only if I set it up in such a way that it disturbs existing IB0 and TCP0 clients there.
> 
> I tried briefly to set up an LNET router at site 1 that only knew about tcp0.  I put a routes line on the client pointing tcp0 at <lnetIP>@tcp0.
> The LNET router can see and lctl ping the FS2 MDS but the client throws an error on startup and doesn't seem to believe there's a route.
> 
> I'm beginning to sense that the only real option is to get rid of the IB name collision and do a tunefs at site2 and change the servers and clients to use o2ib1 rather than o2ib0, or other permutations of renaming networks, but maybe (hopefully) I'm missing something with lnet routing.
> 
> On a side note it's mildly confusing that the ordering of lnet options networks= line takes precedence over the mount command.  If that weren't the case then either modprobe.conf line ordering above would work rather than neither but maybe there's a case I'm missing that requires that lnet option ordering takes precedence over the mount syntax.
> 
> Of course there's the very real possibility I'm missing an obvious simple solution.
> 
> James Robnett
> NRAO/NM
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss




More information about the lustre-discuss mailing list