[Lustre-discuss] networking problem between OSS and MGS

Sebastian Reitenbach sebastia at l00-bugdead-prods.de
Tue Mar 16 03:12:45 PDT 2010


Hi,

I am trying to setup the network between my Lustre servers, actually I am not 
sure whether this will work at all what I am trying, so here it goes.
I use SLES 11 x86_64 and Lustre 1.8.2 on all hosts.
As a start, I have a separate MGS (192.168.0.150) in one network, and a 
separate MDS (192.168.1.216) in a second network, and two OSS nodes in a third 
network. The reason to have the stuff in different network is to have a 
firewall in between, that can restrict access to the hosts. 

The MGS and MDS hosts are each connected with 1 GB Ethernet interface, the OSS 
nodes are connected with two GB network interfaces to the switch in the same 
network. OSS node 1: 192.168.2.21 and 192.168.2.121 OSS node 2: 192.168.2.22 
192.168.2.122.
Filesystems were created on the MDS, MDT.
The filesystems on the OSS nodes were first created the following way:
On OSS1:
mkfs.lustre --fsname=WB01 --ost --mgsnode=192.168.0.150 at tcp 
/dev/mapper/WBOSS1_part1
mkfs.lustre --fsname=WB01 --ost --mgsnode=192.168.0.150 at tcp 
/dev/mapper/WBOSS1_part2
On OSS2:
mkfs.lustre --fsname=WB01 --ost --mgsnode=192.168.0.150 at tcp 
/dev/mapper/WBOSS2_part1
mkfs.lustre --fsname=WB01 --ost --mgsnode=192.168.0.150 at tcp 
/dev/mapper/WBOSS2_part2

and on the OSS hosts, I tried to put the following into the 
/etc/modprobe.conf.local:
options lnet networks=tcp0(eth0,eth1)

However, then only the eth0 was used, the interface where the default route is 
configured.

Then I tried to do the following:
On OSS1:
mkfs.lustre --fsname=WB01 --ost --mgsnode=192.168.0.150 at tcp0 
/dev/mapper/WBOSS1_part1
mkfs.lustre --fsname=WB01 --ost --mgsnode=192.168.0.150 at tcp1 
/dev/mapper/WBOSS1_part2
On OSS2:
mkfs.lustre --fsname=WB01 --ost --mgsnode=192.168.0.150 at tcp0 
/dev/mapper/WBOSS2_part1
mkfs.lustre --fsname=WB01 --ost --mgsnode=192.168.0.150 at tcp1 
/dev/mapper/WBOSS2_part2

and configured this in the /etc/modprobe.conf.local:
options lnet networks=tcp0(eth1),tcp1(eth0)
Then /dev/mapper/WBOSS1_part1 was mountable, but when I tried to mount 
/dev/mapper/WBOSS1_part2 then I get this error message:
mount.lustre: mount /dev/mapper/WBOSS1_part2 at /lustre/WBOSS2-1 failed: 
Cannot send after transport endpoint shutdown
and on the MGS in dmesg I see:
LustreError: 120-3: Refusing connection from 192.168.2.21 for 
192.168.0.150 at tcp1: No matching NI
Where the 192.168.2.21 is the IP of eth0 (tcp1)
when I switch the default routes, then the problem persists

I also tried to configure /etc/modprobe.conf.local the other way around:
options lnet networks=tcp0(eth0),tcp1(eth1)
Then I also was able to mount /dev/mapper/WBOSS1_part1, but not 
/dev/mapper/WBOSS1_part2, with the same error, It just takes the other
Ethernet interface.

Also lctl ping only works for one interface:
oss1:~ # lctl ping 192.168.0.150 at tcp0
12345-0 at lo
12345-192.168.0.150 at tcp
oss1:~ # lctl ping 192.168.0.150 at tcp1
failed to ping 192.168.0.150 at tcp1: Input/output error

Looking further into the documentation, I saw that some kind of load balancing 
can be setup with ip2nets parameter in /etc/modprobe.conf.local
But as far as I can see, I cannot specify two interface to be used to 
communicate with the MGS?

So is above setup possible at all? I hope I was clear enough with my 
explanation ;)

When I now try to to setup bonding on the OSS hosts, which bonding mode would 
be the preferred one to choose?
The manual only talks about the recommended xmit_hash_policy, but does not 
recommend a bonding mode. The switch is capable of doing 802.1ad trunking.

And a more general question: What is recommended anyways, bonding the two 
available interfaces on the OSS servers, or getting Lustre and the routing 
configured to use two separate network interfaces?

regards,
Sebastian




More information about the lustre-discuss mailing list