[Lustre-discuss] How to configure Redundant NICs over separate switches and subnets.

D. Marc Stearman marc at llnl.gov
Tue Jan 22 10:19:10 PST 2008


Functionally, it will work the same, but not performance wise.

tcp0(eth1:0),tcp1(eth0:0) will create two LNET networks, and it will  
use the shorter of the two.  If they are the same in terms of network  
hops from client to server, it will use the first one, and only the  
first one.  This setup would create two NIDs on the servers, so you  
could use either fstab entry discussed before.

tcp0(eth1:0,eth0:0) will create one LNET network, and use all  
interfaces between clients and servers.  You would have double the  
bandwidth.  This setup would create only one NID on the servers, and  
you would use the NID assiociated with eth1:0 in your fstab entries.   
If the NIC (or network) for that NID failed  on the mgs/mds you would  
not be able to mount new clients, but your filesystem should still  
work, as it will mark that route down and use the other interface.

-Marc

----
D. Marc Stearman
LC Lustre Systems Administrator
marc at llnl.gov
925.423.9670
Pager: 1.888.203.0641


On Jan 22, 2008, at 10:12 AM, Lundgren, Andrew wrote:

>>
>> As far as I know, LNET will use the shortest path on the
>> network, so if you have two equivalent tcp networks, tcp0 and
>> tcp1, LNET will just  use the first one.  If it fails, it
>> should use the second one.
>> If both NICs are in the same tcp network, LNET should use both.
>> Whether you decide on one or two LNET networks is up to you.
>
> So setting up and are functionally equivalent for what I am doing?
>
>> Regardless, your fstab entry is not correct.  You should only
>> list one server as the host:
>>
>>> 192.168.136.81 at tcp0:/stage     /stage           lustre
>>> defaults,_netdev 0 0
>>
>> or
>>
>>> 192.168.135.80 at tcp1:/stage     /stage           lustre
>>> defaults,_netdev 0 0
>>
>> If one NIC fails, while a client is not mounted, you would
>> have to change the fstab to remount.  If lustre is already
>> mounted, it should just use the other LNET network.
>
> Then as long as the box does not reboot while the network is down,  
> the mount should still function, just over the secondary path?
>
> Thank you for the clarification.
>
>
> --
> Andrew
>



More information about the lustre-discuss mailing list