[Lustre-discuss] Adding a new client on a different network
Klaus Steden
klaus.steden at thomson.net
Thu Nov 15 11:03:11 PST 2007
Isaac,
I followed your instructions, and now I can see the client on the second
network on the MDS when I run:
lctl> network tcp1
lctl> peer_list
lctl> conn_list
However ... when I attempt to mount, I get this error on the client:
-- client --
mount -t lustre 172.16.128.252 at tcp1:/lustre /mnt/lustre
mount.lustre: mount 172.16.128.252 at tcp1:/lustre at /mnt/lustre failed: No
such file or directory
Is the MGS specification correct?
Is the filesystem name correct?
If upgrading, is the copied client log valid? (see upgrade docs)
-- client --
This is what turns up on the MDS (via dmesg):
-- mds --
LustreError: 14098:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104
reading HELLO from 172.16.128.100
LustreError: 11b-b: Connection to 172.16.128.100 at tcp at host 172.16.128.100
on port 988 was reset: is it running a compatible version of Lustre and is
172.16.128.100 at tcp one of its NIDs?
-- mds --
If I initially constructed my file system to use failover MDS, do I have to
specify it in my mount command?
Is there a way to query the creation-time flags and options set on a
particular file system so I can see if I am indeed attempting to talk to the
MGS as well?
thanks,
Klaus
On 11/15/07 1:11 AM, "Isaac Huang" <He.Huang at Sun.COM>did etch on stone
tablets:
> On Wed, Nov 14, 2007 at 06:23:36PM -0800, Klaus Steden wrote:
> [......]
>> And on the MDS side, here's what I see in the output of 'dmesg':
>>
>> -- mds --
>> LustreError: 120-3: Refusing connection from 172.16.128.100 for
>> 172.16.128.252 at tcp: No matching NI
>> -- mds --
>>
>> I was initially using this in my modprobe.conf:
>>
>> -- modprobe.conf --
>> options lnet networks=tcp0(eth0,bond0)
>> -- modprobe.conf --
>>
>
> This only gave the MDS one NID: IP-of-eth0 at tcp0, i.e. IP address of
> the 1st interface specified was used to generate the NID.
>
>> where 'eth0' is attached to 172.16.129.0/24, and 'bond0' is attached to
>> 172.16.128.0/24.
>>
>
> In your case, 172.16.129.x at tcp.
>
>> What's happening here, and where do I look for information on how to fix it?
>>
>
> When the client tried to reach the MDS at 172.16.128.252 at tcp, the MDS
> refused the connection since 172.16.128.252 at tcp wasn't one of its NIDs.
>
> If they're two separate networks, just give the MDS two NIDs:
> options lnet networks='tcp0(eth0),tcp1(bond0)'
>
> And for clients on eth0's network:
> options lnet networks='tcp0(eth?)'
>
> At last clients on bond0's network:
> options lnet networks='tcp1(eth?)'
>
>
> HTH,
> Isaac
More information about the lustre-discuss
mailing list