[Lustre-discuss] Adding a new client on a different network

Klaus Steden klaus.steden at thomson.net
Thu Nov 15 11:03:11 PST 2007


Isaac,

I followed your instructions, and now I can see the client on the second
network on the MDS when I run:

lctl> network tcp1
lctl> peer_list
lctl> conn_list

However ... when I attempt to mount, I get this error on the client:

-- client --
mount -t lustre 172.16.128.252 at tcp1:/lustre /mnt/lustre
mount.lustre: mount 172.16.128.252 at tcp1:/lustre at /mnt/lustre failed: No
such file or directory
Is the MGS specification correct?
Is the filesystem name correct?
If upgrading, is the copied client log valid? (see upgrade docs)
-- client --

This is what turns up on the MDS (via dmesg):

-- mds --
LustreError: 14098:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104
reading HELLO from 172.16.128.100
LustreError: 11b-b: Connection to 172.16.128.100 at tcp at host 172.16.128.100
on port 988 was reset: is it running a compatible version of Lustre and is
172.16.128.100 at tcp one of its NIDs?
-- mds --

If I initially constructed my file system to use failover MDS, do I have to
specify it in my mount command?

Is there a way to query the creation-time flags and options set on a
particular file system so I can see if I am indeed attempting to talk to the
MGS as well?

thanks,
Klaus

On 11/15/07 1:11 AM, "Isaac Huang" <He.Huang at Sun.COM>did etch on stone
tablets:

> On Wed, Nov 14, 2007 at 06:23:36PM -0800, Klaus Steden wrote:
> [......]
>> And on the MDS side, here's what I see in the output of 'dmesg':
>> 
>> -- mds --
>> LustreError: 120-3: Refusing connection from 172.16.128.100 for
>> 172.16.128.252 at tcp: No matching NI
>> -- mds --
>> 
>> I was initially using this in my modprobe.conf:
>> 
>> -- modprobe.conf --
>> options lnet networks=tcp0(eth0,bond0)
>> -- modprobe.conf --
>> 
> 
> This only gave the MDS one NID: IP-of-eth0 at tcp0, i.e. IP address of
> the 1st interface specified was used to generate the NID.
> 
>> where 'eth0' is attached to 172.16.129.0/24, and 'bond0' is attached to
>> 172.16.128.0/24.
>> 
> 
> In your case, 172.16.129.x at tcp.
> 
>> What's happening here, and where do I look for information on how to fix it?
>> 
> 
> When the client tried to reach the MDS at 172.16.128.252 at tcp, the MDS
> refused the connection since 172.16.128.252 at tcp wasn't one of its NIDs.
> 
> If they're two separate networks, just give the MDS two NIDs:
> options lnet networks='tcp0(eth0),tcp1(bond0)'
> 
> And for clients on eth0's network:
> options lnet networks='tcp0(eth?)'
> 
> At last clients on bond0's network:
> options lnet networks='tcp1(eth?)'
> 
> 
> HTH,
> Isaac




More information about the lustre-discuss mailing list