[Lustre-discuss] Lustre over o2ib issue

Wed Mar 16 11:23:30 PDT 2011

Howdy,

Lustre 1.8.5 using the EL5 provided RPMs on both clients and servers
lustre-client-modules-1.8.5-2.6.18_194.17.1.el5_lustre.1.8.5
lustre-client-1.8.5-2.6.18_194.17.1.el5_lustre.1.8.5

The servers and clients are all running CentOS 5.5 x86_64 with kernel 2.6.18-194.17.1.el5 (servers running the Lustre patched kernel)

We have two Infiniband networks, o2ib0 and o2ib1 as well as ethernet. Here's the lnet modprobe used on the mds and oss's:

options lnet networks="o2ib0(ib0),o2ib1(ib1),tcp0(eth0)"

The compute nodes that mount via tcp0 don't have any problems.
The compute nodes that mount via o2ib1 do not have any problems.

The compute nodes attached to o2ib0 fail to mount the Lustre file system at boot (output of dmesg is at the end).

The compute nodes are Dell M610 blades. There are 3 Dell M1000e chassis switches (Mellanox InfiniScale IV M3601Q 32 port 40Gb/s switches), each attached to a QLogic 12300 36 port QDR switch via 8 cables. The compute nodes are directly attached to the M3601Q switches internally (blades). The Lustre servers are attached directly to the QLogic 12300 switch.

All of our Infiniband tests have checked out and the switches do not report an errors.

Here's the scenario that enables me to mount the lustre file system after the compute node has booted.
1. ssh to the node

2. check ibstat to ensure that the port on the card reports the port as active, success

3. Run ibswitches as a test to ensure it can see the switches, success

4. Ping using another IPoverIB address using regular ping
# ping 192.168.2.20
PING 192.168.2.20 (192.168.2.20) 56(84) bytes of data.
64 bytes from 192.168.2.20: icmp_seq=1 ttl=64 time=1.93 ms

5. Try to ping the MDS using lctl ping
# lctl ping 192.168.2.20 at o2ib
failed to ping 192.168.2.20 at o2ib: Input/output error

6. Try it again (this step isn't actually necessary, after the single failed ping, I can then mount)
# lctl ping 192.168.2.20 at o2ib
12345-0 at lo
12345-192.168.2.20 at o2ib
12345-192.168.3.20 at o2ib1
12345-172.20.0.20 at tcp

7. Now mount lustre
# mount /lustre
# mount | grep lustre
192.168.2.20 at o2ib:/lustre on /lustre type lustre (rw,_netdev)

Instead of doing the "lctl ping" I can also do a mount, which will fail, followed by another which will succeed.

Here's the messages logged during boot, anyone have any suggestions? Thanks, Mike

Lustre: Listener bound to ib0:192.168.2.229:987:mlx4_0
Lustre: Register global MR array, MR size: 0xffffffffffffffff, array size: 1
Lustre: Added LNI 192.168.2.229 at o2ib [8/64/0/180]
Lustre: Lustre Client File System; http://www.lustre.org/
Lustre: 4989:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1363390903615489 sent from MGC192.168.2.20 at o2ib to NID 192.168.2.20 at o2ib 5s ago has timed out (5s prior to deadline).
  req at ffff81062b455c00 x1363390903615489/t0 o250->MGS at MGC192.168.2.20@o2ib_0:26/25 lens 368/584 e 0 to 1 dl 1300230903 ref 2 fl Rpc:N/0/0 rc 0/0
LustreError: 5076:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID  req at ffff81062b455000 x1363390903615491/t0 o501->MGS at MGC192.168.2.20@o2ib_0:26/25 lens 264/432 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 15c-8: MGC192.168.2.20 at o2ib: The configuration from log 'lustre-client' failed (-108). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 5076:0:(llite_lib.c:1079:ll_fill_super()) Unable to process log: -108
Lustre: client ffff81062dddf400 umount complete
LustreError: 5076:0:(obd_mount.c:2050:lustre_fill_super()) Unable to mount  (-108)
ib_srp: ASYNC event= 11 on device= mlx4_0
ib_srp: ASYNC event= 17 on device= mlx4_0
ib_srp: ASYNC event= 9 on device= mlx4_0
ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready