[Lustre-discuss] How do you make an MGS/OSS listen on 2 NICs?

Herb Wartens wartens2 at llnl.gov
Fri Jan 18 08:54:22 PST 2008



Isaac Huang wrote:
> On Thu, Jan 17, 2008 at 01:59:46PM -0800, Herb Wartens wrote:
>> ...... 
>> Here is an example below of what I was referring to:
>>
>> Node1:
>> ilc6 a lustre server that has two separate ethernet devices eth2 and eth3
>>
>> # ilc6 /root > cat /etc/modprobe.conf
>> options lnet networks="tcp0(eth2,eth3)" \
>>         routes="elan0 172.16.3.[4-6]@tcp0"
>>
>> # ilc6 /root > lctl list_nids
>> 172.16.101.6 at tcp
>>
>> Node2:
>> adev4 is a lustre router that has two separate ethernet devices and and elan device
>>
>> # adev4 /root > cat /etc/modprobe.conf
>> options lnet networks="tcp0(eth0,eth1),elan0" \
>>              forwarding="enabled"
>>
>> # adev4 /root > lctl list_nids
>> 172.16.3.4 at tcp
>> 4 at elan
>>
>> Node3:
>> adev3 is a lustre client with only an elan device
>>
>> # adev3 /root > lctl list_nids
>> 3 at elan
>>
>>
>> Now the actual problem here is that
>> 1) ilc6 can only successfully issue an lctl ping to the tcp nid even though it knows
>>    how to get to the elan0 network.
>> 2) adev3 can only successfully issue an lctl ping to the elan nid even though it knows
>>    how to get to the tcp0 network.
>>
>> FROM Node1:
>> # ilc6 /root > lctl ping 172.16.3.4 at tcp0
>> 12345-0 at lo
>> 12345-172.16.3.4 at tcp
>> 12345-4 at elan
>>
>> # ilc6 /root > lctl ping 3 at elan
>> 12345-0 at lo
>> 12345-3 at elan
>>
>> ERROR:
>> # ilc6 /root > lctl ping 4 at elan
>> failed to ping 4 at elan: Input/output error
>>
> 
> The router rejected the ping request message because it believed that
> ilc6 could reach him via another NID (172.16.3.4 at tcp0) which was
> closer to ilc6 than its elan NID.
> 
> You should see a message on router dmesg that read:
> 172.16.101.6 at tcp, src 172.16.101.6 at tcp: Bad dest nid 4 at elan ......
> 
> I don't think it's a lnet bug.

Correct that is how it works now, however I believe we have different ideas
of what lctl ping should mean.  In my opinion it should not matter if you can get
to a node via another nid that is "closer."  I feel that the lctl ping should
show lnet connectivity regardless of the fact that there is a "closer" network.
Maybe just me...=)

-Herb

> 
> Isaac



More information about the lustre-discuss mailing list