[Lustre-discuss] lustre using wrong network

Michael Di Domenico mdidomenico4 at gmail.com
Thu Jun 18 18:51:33 PDT 2009


On Thu, Jun 18, 2009 at 9:48 PM, Isaac Huang<He.Huang at sun.com> wrote:
> On Thu, Jun 18, 2009 at 09:11:50PM -0400, Michael Di Domenico wrote:
>> I cannot figure out what exactly has happened here and how to recover from it.
>>
>> Jun 18 21:02:52 node0-eth1 kernel: LustreError:
>> 2722:0:(socklnd_cb.c:2156:ksocknal_recv_hello()) Error -104 reading
>> HELLO from 192.168.0.248
>> Jun 18 21:02:52 node0-eth1 kernel: LustreError: 11b-b: Connection to
>> 192.168.0.248 at tcp at host 192.168.0.248 on port 988 was reset: is it
>> running a compatible version of Lustre and is 192.168.0.248 at tcp one of
>> its NIDs?
>
> Lustre asked lnet to connect to 192.168.0.248 at tcp.
>
>> for some reason when i mount the OST on the above node it's trying to
>> connect to itself on eth0, even though i have networks=tcp0(eth1) in
>> my modprobe.conf and the NID is set to 192.168.1.248
>>
>> Jun 18 21:02:52 node0-eth1 kernel: Lustre: Client data1-client has started
>> Jun 18 21:02:52 node7-eth0 kernel: LustreError: 120-3: Refusing
>> connection from 192.168.0.50 for 192.168.0.248 at tcp: No matching NI
>
> But the connection was rejected because the server didn't have
> 192.168.0.248 at tcp as one of its NIDs.
>
> What was your mount command line? What does 'lctl list_nids' say on
> the nodes?

list_nids show the right nid on all the nodes 192.168.1.x at tcp

192.168.0.x does exist on all the nodes, but lustre shouldn't be
trying to use it ever



More information about the lustre-discuss mailing list