[Lustre-discuss] How do you make an MGS/OSS listen on 2 NICs?

Isaac Huang He.Huang at Sun.COM
Fri Jan 18 07:23:27 PST 2008


On Thu, Jan 17, 2008 at 01:59:46PM -0800, Herb Wartens wrote:
> ...... 
> Here is an example below of what I was referring to:
> 
> Node1:
> ilc6 a lustre server that has two separate ethernet devices eth2 and eth3
> 
> # ilc6 /root > cat /etc/modprobe.conf
> options lnet networks="tcp0(eth2,eth3)" \
>         routes="elan0 172.16.3.[4-6]@tcp0"
> 
> # ilc6 /root > lctl list_nids
> 172.16.101.6 at tcp
> 
> Node2:
> adev4 is a lustre router that has two separate ethernet devices and and elan device
> 
> # adev4 /root > cat /etc/modprobe.conf
> options lnet networks="tcp0(eth0,eth1),elan0" \
>              forwarding="enabled"
> 
> # adev4 /root > lctl list_nids
> 172.16.3.4 at tcp
> 4 at elan
> 
> Node3:
> adev3 is a lustre client with only an elan device
> 
> # adev3 /root > lctl list_nids
> 3 at elan
> 
> 
> Now the actual problem here is that
> 1) ilc6 can only successfully issue an lctl ping to the tcp nid even though it knows
>    how to get to the elan0 network.
> 2) adev3 can only successfully issue an lctl ping to the elan nid even though it knows
>    how to get to the tcp0 network.
> 
> FROM Node1:
> # ilc6 /root > lctl ping 172.16.3.4 at tcp0
> 12345-0 at lo
> 12345-172.16.3.4 at tcp
> 12345-4 at elan
> 
> # ilc6 /root > lctl ping 3 at elan
> 12345-0 at lo
> 12345-3 at elan
> 
> ERROR:
> # ilc6 /root > lctl ping 4 at elan
> failed to ping 4 at elan: Input/output error
> 

The router rejected the ping request message because it believed that
ilc6 could reach him via another NID (172.16.3.4 at tcp0) which was
closer to ilc6 than its elan NID.

You should see a message on router dmesg that read:
172.16.101.6 at tcp, src 172.16.101.6 at tcp: Bad dest nid 4 at elan ......

I don't think it's a lnet bug.

Isaac



More information about the lustre-discuss mailing list