[Lustre-discuss] multi-homed lustre with both IB and TCP

John Lalande john.lalande at ssec.wisc.edu
Fri Mar 21 12:56:10 PDT 2014


Hi-

I am trying to set up a robinhood policy engine server that will watch 
several different Lustre file systems -- one of which will have a direct 
Infiniband connection, one via TCP without an intermediate Lustre router 
and several other Lustre file systems via TCP through Lustre routers.

I can mount filesystems via IB and direct TCP, but not the routed ones. 
(I am able to mount the routed ones if I take out the config for o2ib0 at ib0).

My modprobe.conf looks like this:
options lnet networks="o2ib0(ib0),tcp0(em1.497)" routes="o2ib1 
ROUTER1_IP at tcp0; o2ib1 ROUTER2_IP at tcp0; o2ib1 ROUTER3_IP at tcp0"

where router1_IP, router2_IP, etc. are actual IP addresses on our 
University's subnet that I don't want to publish here.

/etc/fstab looks like this:

172.17.1.5 at o2ib0:/ib_filesystem    /ib_filesystem    lustre 
defaults,_netdev,user_xattr    0 0
172.16.24.5 at o2ib1:/routedfs1         /fs1            lustre 
defaults,_netdev,user_xattr        0 0
172.16.23.14 at o2ib1:/routedfs2      /fs2        lustre 
defaults,_netdev,user_xattr     0 0
172.16.25.189 at o2ib1:/routedfs3     /fs3        lustre 
defaults,_netdev,user_xattr     0 0
172.16.25.241 at o2ib1:/routedfs4       /fs4            lustre 
defaults,_netdev,user_xattr        0 0
128.104.X.X at tcp:/tcpfs1       /tcpfs1            lustre 
defaults,_netdev        0 0

In dmesg, I see:
Lustre: 6923:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request 
sent has timed out for slow reply: [sent 1395431267/real 1395431267]  
req at ffff880c2aa04800 x1463215106031860/t0(0) 
o250->MGC172.16.24.5 at o2ib1@172.16.24.5 at o2ib1:26/25 lens 400/544 e 0 to 1 
dl 1395431272 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
LustreError: 7239:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send 
limit expired   req at ffff880c2aa04000 x1463215106031864/t0(0) 
o101->MGC172.16.24.5 at o2ib1@172.16.24.5 at o2ib1:26/25 lens 328/344 e 0 to 0 
dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
LustreError: 7230:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send 
limit expired   req at ffff88182b1fac00 x1463215106031872/t0(0) 
o101->MGC172.16.24.5 at o2ib1@172.16.24.5 at o2ib1:26/25 lens 328/344 e 0 to 0 
dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
LustreError: 7230:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send 
limit expired   req at ffff88182a1ab000 x1463215106031876/t0(0) 
o101->MGC172.16.24.5 at o2ib1@172.16.24.5 at o2ib1:26/25 lens 328/344 e 0 to 0 
dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
Lustre: 6923:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request 
sent has timed out for slow reply: [sent 1395431292/real 1395431292]  
req at ffff88182a2a3400 x1463215106031976/t0(0) 
o250->MGC172.16.24.5 at o2ib1@172.16.24.5 at o2ib1:26/25 lens 400/544 e 0 to 1 
dl 1395431302 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
LustreError: 7239:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send 
limit expired   req at ffff880c2aa04000 x1463215106031868/t0(0) 
o101->MGC172.16.24.5 at o2ib1@172.16.24.5 at o2ib1:26/25 lens 328/344 e 0 to 0 
dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1

So ... is what we're trying to do here possible, and I'm just mangling 
the config, or is Lustre over IB + Lustre via IB router not possible?

Thanks for any help!

John

-- 
John Lalande
Space Science & Engineering Center
University of Wisconsin - Madison
john.lalande at ssec.wisc.edu / 608-263-2268


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6251 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20140321/763be742/attachment.bin>


More information about the lustre-discuss mailing list