[Lustre-discuss] lustre using wrong network

Isaac Huang He.Huang at Sun.COM
Thu Jun 18 18:48:26 PDT 2009


On Thu, Jun 18, 2009 at 09:11:50PM -0400, Michael Di Domenico wrote:
> I cannot figure out what exactly has happened here and how to recover from it.
> 
> Jun 18 21:02:52 node0-eth1 kernel: LustreError:
> 2722:0:(socklnd_cb.c:2156:ksocknal_recv_hello()) Error -104 reading
> HELLO from 192.168.0.248
> Jun 18 21:02:52 node0-eth1 kernel: LustreError: 11b-b: Connection to
> 192.168.0.248 at tcp at host 192.168.0.248 on port 988 was reset: is it
> running a compatible version of Lustre and is 192.168.0.248 at tcp one of
> its NIDs?

Lustre asked lnet to connect to 192.168.0.248 at tcp.

> for some reason when i mount the OST on the above node it's trying to
> connect to itself on eth0, even though i have networks=tcp0(eth1) in
> my modprobe.conf and the NID is set to 192.168.1.248
> 
> Jun 18 21:02:52 node0-eth1 kernel: Lustre: Client data1-client has started
> Jun 18 21:02:52 node7-eth0 kernel: LustreError: 120-3: Refusing
> connection from 192.168.0.50 for 192.168.0.248 at tcp: No matching NI

But the connection was rejected because the server didn't have
192.168.0.248 at tcp as one of its NIDs.

What was your mount command line? What does 'lctl list_nids' say on
the nodes?

Isaac



More information about the lustre-discuss mailing list