[Lustre-discuss] Network name o2ib0 collision in two discrete filesystems

Isaac Huang he.huang at intel.com
Tue Sep 9 10:14:27 PDT 2014


On Tue, Sep 09, 2014 at 05:04:58AM -0600, James Robnett wrote:
> 
> I'm having difficulty figuring out a solution to an LNET issue I'm having.
> 
> We have two Lustre filesystems separated by about 60 miles, both of
> which have o2ib0(ib0) and tcp(eth0) networks defined.  Both have IB
> and TCP clients which work just fine.
> 
> I'll call them FS1 and FS2.
> 
> FS1-mds at ib0  192.168.1.11
> FS1-mds at eth0 10.1.1.11
> 
> FS2-mds at ib0  192.168.2.11
> FS2-mds at eth0 10.1.2.11
> 
> We have a need for a client physically at site-1 to mount the
> filesystems from both sites.  The intent is to mount the local FS1
> via IB0 and the remote FS2 via TCP0 (accessible over gbit).
> 
> The mount commands for the client are:
> mount −t lustre 192.168.1.11 at o2ib0:/lustre /lustre/FS1
> mount −t lustre 10.1.2.11 at tcp0:/lustre /lustre/FS2
> 
> If I set this client's modprobe.conf line as
> 
> options network=o2ib0(ib0), tcp0(eth0)
> 
> then it mounts FS1 without issue but then fails on FS2 since it
> tries to communicate via o2ib0 despite the mount command specifying
> tcp0. Presumably since the client asserts it knows about both o2ib0
> and tcp0 without realizing o2ib0 at site1 is functionally different
> from o2ib0 at site2.

There's no way the client can tell the difference, if the two IB
networks have a same name.

If two distinct networks (i.e. no connectivity between them) must be
used in a same address space (e.g. a client), they must be assigned to
different names. In your case, I believe it'd work by renaming either
IB network as @o2ib1.

-Isaac



More information about the lustre-discuss mailing list