[Lustre-discuss] Multihomed question: want Lustre over IB and Ethernet

Charles Taylor taylor at hpc.ufl.edu
Fri Mar 7 02:40:23 PST 2008



Do "lclt list_nids" on your mds and oss's.   They should look  
something like this.

[root at hpcmds ~]# lctl list_nids
10.13.24.40 at o2ib
10.13.16.40 at tcp

Then your clients should have a nid on one or the other.

Check your dmesg output after loading lnet.   The complaints are  
pretty useful.  Your modprobe.conf line looks correct although we  
found we did not need all the quoting so you should check that as  
well.   Ours looks like...

options lnet networks=o2ib(ib0),tcp(eth0)

My guess is that it either cannot find or does not like your ko2iblnd  
module.

ct

On Mar 7, 2008, at 12:46 AM, Chris Worley wrote:

> Most everything is over IB, but I have a few systems I'd like to mount
> the Lustre fs over GigE.
>
> I think I've followed the Multihomed instructions correctly, in:
>
> http://dlc.sun.com/pdf/820-3681/820-3681.pdf
>
> My /etc/modprobe.conf on mds/mgs/oss servers (which all have both
> Ethernet and IB) includes:
>
> options lnet 'networks="tcp0(eth0),o2ib0(ib0)"'
>
> I make and mount the mdt with (which has both IB and Ethernet, subnet
> 36.122.x.x is IB, 36.121.x.x is Ethernet):
>
> # mkfs.lustre --mdt --mgs
> --mgsnode="36.122.255.201 at o2ib0,36.121.255.201 at tcp0" <... > /dev/md0
> # mount -t lustre /dev/md0  /lfs/mdtb
>
> But, at this point, the ksocklnd module is loaded rather than the
> ko2iblnd module!
>
> On the OSS, I make the fs w/ the same  "msgnode", but, when I try to
> mount it, it correctly uses the IB interface, but can't contact the
> MDS:
>
> LustreError: 27520:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
> for MGC36.122.255.201 at o2ib_0
> LustreError: 27520:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
> find peer MGC36.122.255.201 at o2ib_0!
> LustreError: 27520:0:(ldlm_lib.c:312:client_obd_setup()) can't add
> initial connection
> LustreError: 17126:0:(connection.c:142:ptlrpc_put_connection())  
> NULL connection
> LustreError: 27520:0:(obd_config.c:325:class_setup()) setup
> MGC36.122.255.201 at o2ib failed (-2)
> LustreError: 27520:0:(obd_mount.c:454:lustre_start_simple())
> MGC36.122.255.201 at o2ib setup error -2
> LustreError: 27520:0:(obd_mount.c:1368:server_put_super()) no obd  
> ddnlfs-OSTffff
> LustreError: 27520:0:(obd_mount.c:119:server_deregister_mount())
> ddnlfs-OSTffff not registered
>
> It too has loaded the ksocklnd module, and not the ko2iblnd module.  I
> guess that both modules should be loaded in a multihomed case?
>
> What am I doing wrong?
>
> Thanks,
>
> Chris
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss




More information about the lustre-discuss mailing list