[Lustre-discuss] Multihomed question: want Lustre over IB and Ethernet

Craig Prescott prescott at hpc.ufl.edu
Fri Mar 7 08:39:57 PST 2008


I think your client modprobe.conf lnet option
should be this:

options lnet networks=o2ib(ib0)

(not 'o2ib0').

Another thing to try, if that doesn't work lctl
ping your MDS/MGS/OSS nids, like so:

lctl ping 36.122.255.201 at o2ib

Cheers,
Craig


Chris Worley wrote:
> More issues.  Now, on the clients.
> 
> The MDT/MGS/OST's are all up and mounted, showing:
> 
> # lctl list_nids
> 36.122.255.201 at o2ib
> 36.121.255.201 at tcp
> 
> Now, when I go to mount on the IB-based clients, I get:
> 
> # mount -t lustre 36.122.255.201 at o2ib:/ddnlfs /lfs
> mount.lustre: mount 36.122.255.201 at o2ib:/ddnlfs at /lfs failed: No
> such file or directory
> Is the MGS specification correct?
> Is the filesystem name correct?
> If upgrading, is the copied client log valid? (see upgrade docs)
> 
> The modprobe.conf contains:
> 
> options lnet networks=o2ib0(ib0)
> 
> And lctl looks good:
> 
> # lctl list_nids
> 36.122.255.1 at o2ib
> 
> But dmesg shows that it wants to go over the 36.121.x.x (tcp) network
> (36.12[12].255.201 is the MGS/MDS server):
> 
> LustreError: 10001:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
> for 36.121.255.201 at tcp
> LustreError: 10001:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
> find peer 36.121.255.201 at tcp!
> LustreError: 10001:0:(ldlm_lib.c:312:client_obd_setup()) can't add
> initial connection
> LustreError: 9836:0:(connection.c:142:ptlrpc_put_connection()) NULL connection
> LustreError: 10001:0:(obd_config.c:325:class_setup()) setup
> ddnlfs-MDT0000-mdc-0000010430913c00 failed (-2)
> LustreError: 10001:0:(obd_config.c:1062:class_config_llog_handler())
> Err -2 on cfg command:
> Lustre:    cmd=cf003 0:ddnlfs-MDT0000-mdc  1:ddnlfs-MDT0000_UUID
> 2:36.121.255.201 at tcp
> LustreError: 15c-8: MGC36.122.255.201 at o2ib: The configuration from log
> 'ddnlfs-client' failed (-2). This may be the result of communication
> errors between this node and the MGS, a bad configuration, or other
> errors. See the syslog for more information.
> LustreError: 10001:0:(llite_lib.c:1021:ll_fill_super()) Unable to
> process log: -2
> LustreError: 10001:0:(obd_config.c:392:class_cleanup()) Device 2 not setup
> Lustre: client 0000010430913c00 umount complete
> LustreError: 10001:0:(obd_mount.c:1924:lustre_fill_super()) Unable to
> mount  (-2)
> 
> Note that this setup works fine in the non-multihomed setup, so I
> don't think ko2iblnd is to blame (the setup on the clients hasn't
> changed at all).
> 
> What am I doing wrong?
> 
> Thanks,
> 
> Chris
> On Fri, Mar 7, 2008 at 7:41 AM, Chris Worley <worleys at gmail.com> wrote:
>> I changed my modprobe.conf to look exactly as yours, and it worked.  I
>>   hadn't been using all the quotes until the doc said to... but they may
>>   have indeed been the problem.
>>
>>   Thanks!
>>
>>   Chris
>>
>>  On Fri, Mar 7, 2008 at 3:40 AM, Charles Taylor <taylor at hpc.ufl.edu> wrote:
>>   >
>>   >
>>   >  Do "lclt list_nids" on your mds and oss's.   They should look
>>   >  something like this.
>>   >
>>   >  [root at hpcmds ~]# lctl list_nids
>>   >  10.13.24.40 at o2ib
>>   >  10.13.16.40 at tcp
>>   >
>>   >  Then your clients should have a nid on one or the other.
>>   >
>>   >  Check your dmesg output after loading lnet.   The complaints are
>>   >  pretty useful.  Your modprobe.conf line looks correct although we
>>   >  found we did not need all the quoting so you should check that as
>>   >  well.   Ours looks like...
>>   >
>>   >  options lnet networks=o2ib(ib0),tcp(eth0)
>>   >
>>   >  My guess is that it either cannot find or does not like your ko2iblnd
>>   >  module.
>>   >
>>   >  ct
>>   >
>>   >
>>   >
>>   >  On Mar 7, 2008, at 12:46 AM, Chris Worley wrote:
>>   >
>>   >  > Most everything is over IB, but I have a few systems I'd like to mount
>>   >  > the Lustre fs over GigE.
>>   >  >
>>   >  > I think I've followed the Multihomed instructions correctly, in:
>>   >  >
>>   >  > http://dlc.sun.com/pdf/820-3681/820-3681.pdf
>>   >  >
>>   >  > My /etc/modprobe.conf on mds/mgs/oss servers (which all have both
>>   >  > Ethernet and IB) includes:
>>   >  >
>>   >  > options lnet 'networks="tcp0(eth0),o2ib0(ib0)"'
>>   >  >
>>   >  > I make and mount the mdt with (which has both IB and Ethernet, subnet
>>   >  > 36.122.x.x is IB, 36.121.x.x is Ethernet):
>>   >  >
>>   >  > # mkfs.lustre --mdt --mgs
>>   >  > --mgsnode="36.122.255.201 at o2ib0,36.121.255.201 at tcp0" <... > /dev/md0
>>   >  > # mount -t lustre /dev/md0  /lfs/mdtb
>>   >  >
>>   >  > But, at this point, the ksocklnd module is loaded rather than the
>>   >  > ko2iblnd module!
>>   >  >
>>   >  > On the OSS, I make the fs w/ the same  "msgnode", but, when I try to
>>   >  > mount it, it correctly uses the IB interface, but can't contact the
>>   >  > MDS:
>>   >  >
>>   >  > LustreError: 27520:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
>>   >  > for MGC36.122.255.201 at o2ib_0
>>   >  > LustreError: 27520:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
>>   >  > find peer MGC36.122.255.201 at o2ib_0!
>>   >  > LustreError: 27520:0:(ldlm_lib.c:312:client_obd_setup()) can't add
>>   >  > initial connection
>>   >  > LustreError: 17126:0:(connection.c:142:ptlrpc_put_connection())
>>   >  > NULL connection
>>   >  > LustreError: 27520:0:(obd_config.c:325:class_setup()) setup
>>   >  > MGC36.122.255.201 at o2ib failed (-2)
>>   >  > LustreError: 27520:0:(obd_mount.c:454:lustre_start_simple())
>>   >  > MGC36.122.255.201 at o2ib setup error -2
>>   >  > LustreError: 27520:0:(obd_mount.c:1368:server_put_super()) no obd
>>   >  > ddnlfs-OSTffff
>>   >  > LustreError: 27520:0:(obd_mount.c:119:server_deregister_mount())
>>   >  > ddnlfs-OSTffff not registered
>>   >  >
>>   >  > It too has loaded the ksocklnd module, and not the ko2iblnd module.  I
>>   >  > guess that both modules should be loaded in a multihomed case?
>>   >  >
>>   >  > What am I doing wrong?
>>   >  >
>>   >  > Thanks,
>>   >  >
>>   >  > Chris
>>   >  > _______________________________________________
>>   >  > Lustre-discuss mailing list
>>   >  > Lustre-discuss at lists.lustre.org
>>   >  > http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>   >
>>   >
>>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss




More information about the lustre-discuss mailing list