[Lustre-discuss] Multihomed question: want Lustre over IB and Ethernet

Chris Worley worleys at gmail.com
Fri Mar 7 09:03:17 PST 2008


On Fri, Mar 7, 2008 at 9:39 AM, Craig Prescott <prescott at hpc.ufl.edu> wrote:
>
>  I think your client modprobe.conf lnet option
>  should be this:
>
>
>  options lnet networks=o2ib(ib0)
>
>  (not 'o2ib0').

It still seems to want the TCP connection:

Lustre: Added LNI 36.122.255.1 at o2ib [8/64]
Lustre: Lustre Client File System; info at clusterfs.com
LustreError: 11043:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
for 36.121.255.201 at tcp
LustreError: 11043:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
find peer 36.121.255.201 at tcp!
LustreError: 11043:0:(ldlm_lib.c:312:client_obd_setup()) can't add
initial connection
LustreError: 11043:0:(obd_config.c:325:class_setup()) setup
ddnlfs-MDT0000-mdc-0000010430934400 failed (-2)
LustreError: 11043:0:(obd_config.c:1062:class_config_llog_handler())
Err -2 on cfg command:
LustreError: 11141:0:(connection.c:142:ptlrpc_put_connection()) NULL connection
Lustre:    cmd=cf003 0:ddnlfs-MDT0000-mdc  1:ddnlfs-MDT0000_UUID
2:36.121.255.201 at tcp
LustreError: 15c-8: MGC36.122.255.201 at o2ib: The configuration from log
'ddnlfs-client' failed (-2). This may be the result of communication
errors between this node and the MGS, a bad configuration, or other
errors. See the syslog for more information.
LustreError: 11043:0:(llite_lib.c:1021:ll_fill_super()) Unable to
process log: -2
LustreError: 11043:0:(obd_config.c:392:class_cleanup()) Device 2 not setup
Lustre: client 0000010430934400 umount complete
LustreError: 11043:0:(obd_mount.c:1924:lustre_fill_super()) Unable to
mount  (-2)

>
>  Another thing to try, if that doesn't work lctl
>  ping your MDS/MGS/OSS nids, like so:
>
>  lctl ping 36.122.255.201 at o2ib

Before and after the change it looks the same:

# lctl ping 36.122.255.201 at o2ib
12345-0 at lo
12345-36.122.255.201 at o2ib
12345-36.121.255.201 at tcp

If I change my modprobe.conf to look as on the MDS/OSS's:

options lnet networks=o2ib0(ib0),tcp0(eth0)

Then, mount just specifying o2ib:

# mount -t lustre 36.122.255.201 at o2ib:/ddnlfs /lfs

It works, but, both ko2iblnd and ksocklnd are loaded.

The dmesg output is:

Lustre: OBD class driver, info at clusterfs.com
        Lustre Version: 1.6.4.2
        Build Version:
1.6.4.2-19691231190000-PRISTINE-.usr.src.linux-2.6.9-67.0.4.EL-Lustre-1.6.4.2
Lustre: Added LNI 36.122.255.1 at o2ib [8/64]
Lustre: Added LNI 36.121.255.1 at tcp [8/256]
Lustre: Accept secure, port 988
Lustre: Lustre Client File System; info at clusterfs.com
Lustre: ddnlfs-clilov-000001042f8b7c00.lov: set parameter stripesize=2M
Lustre: Client ddnlfs-client has started

Can I be certain it'll use IB for LFS on this client?

Thanks,

Chris
>
>  Cheers,
>  Craig
>
>
>
>
>  Chris Worley wrote:
>  > More issues.  Now, on the clients.
>  >
>  > The MDT/MGS/OST's are all up and mounted, showing:
>  >
>  > # lctl list_nids
>  > 36.122.255.201 at o2ib
>  > 36.121.255.201 at tcp
>  >
>  > Now, when I go to mount on the IB-based clients, I get:
>  >
>  > # mount -t lustre 36.122.255.201 at o2ib:/ddnlfs /lfs
>  > mount.lustre: mount 36.122.255.201 at o2ib:/ddnlfs at /lfs failed: No
>  > such file or directory
>  > Is the MGS specification correct?
>  > Is the filesystem name correct?
>  > If upgrading, is the copied client log valid? (see upgrade docs)
>  >
>  > The modprobe.conf contains:
>  >
>  > options lnet networks=o2ib0(ib0)
>  >
>  > And lctl looks good:
>  >
>  > # lctl list_nids
>  > 36.122.255.1 at o2ib
>  >
>  > But dmesg shows that it wants to go over the 36.121.x.x (tcp) network
>  > (36.12[12].255.201 is the MGS/MDS server):
>  >
>  > LustreError: 10001:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
>  > for 36.121.255.201 at tcp
>  > LustreError: 10001:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
>  > find peer 36.121.255.201 at tcp!
>  > LustreError: 10001:0:(ldlm_lib.c:312:client_obd_setup()) can't add
>  > initial connection
>  > LustreError: 9836:0:(connection.c:142:ptlrpc_put_connection()) NULL connection
>  > LustreError: 10001:0:(obd_config.c:325:class_setup()) setup
>  > ddnlfs-MDT0000-mdc-0000010430913c00 failed (-2)
>  > LustreError: 10001:0:(obd_config.c:1062:class_config_llog_handler())
>  > Err -2 on cfg command:
>  > Lustre:    cmd=cf003 0:ddnlfs-MDT0000-mdc  1:ddnlfs-MDT0000_UUID
>  > 2:36.121.255.201 at tcp
>  > LustreError: 15c-8: MGC36.122.255.201 at o2ib: The configuration from log
>  > 'ddnlfs-client' failed (-2). This may be the result of communication
>  > errors between this node and the MGS, a bad configuration, or other
>  > errors. See the syslog for more information.
>  > LustreError: 10001:0:(llite_lib.c:1021:ll_fill_super()) Unable to
>  > process log: -2
>  > LustreError: 10001:0:(obd_config.c:392:class_cleanup()) Device 2 not setup
>  > Lustre: client 0000010430913c00 umount complete
>  > LustreError: 10001:0:(obd_mount.c:1924:lustre_fill_super()) Unable to
>  > mount  (-2)
>  >
>  > Note that this setup works fine in the non-multihomed setup, so I
>  > don't think ko2iblnd is to blame (the setup on the clients hasn't
>  > changed at all).
>  >
>  > What am I doing wrong?
>  >
>  > Thanks,
>  >
>  > Chris
>  > On Fri, Mar 7, 2008 at 7:41 AM, Chris Worley <worleys at gmail.com> wrote:
>  >> I changed my modprobe.conf to look exactly as yours, and it worked.  I
>  >>   hadn't been using all the quotes until the doc said to... but they may
>  >>   have indeed been the problem.
>  >>
>  >>   Thanks!
>  >>
>  >>   Chris
>  >>
>  >>  On Fri, Mar 7, 2008 at 3:40 AM, Charles Taylor <taylor at hpc.ufl.edu> wrote:
>  >>   >
>  >>   >
>  >>   >  Do "lclt list_nids" on your mds and oss's.   They should look
>  >>   >  something like this.
>  >>   >
>  >>   >  [root at hpcmds ~]# lctl list_nids
>  >>   >  10.13.24.40 at o2ib
>  >>   >  10.13.16.40 at tcp
>  >>   >
>  >>   >  Then your clients should have a nid on one or the other.
>  >>   >
>  >>   >  Check your dmesg output after loading lnet.   The complaints are
>  >>   >  pretty useful.  Your modprobe.conf line looks correct although we
>  >>   >  found we did not need all the quoting so you should check that as
>  >>   >  well.   Ours looks like...
>  >>   >
>  >>   >  options lnet networks=o2ib(ib0),tcp(eth0)
>  >>   >
>  >>   >  My guess is that it either cannot find or does not like your ko2iblnd
>  >>   >  module.
>  >>   >
>  >>   >  ct
>  >>   >
>  >>   >
>  >>   >
>  >>   >  On Mar 7, 2008, at 12:46 AM, Chris Worley wrote:
>  >>   >
>  >>   >  > Most everything is over IB, but I have a few systems I'd like to mount
>  >>   >  > the Lustre fs over GigE.
>  >>   >  >
>  >>   >  > I think I've followed the Multihomed instructions correctly, in:
>  >>   >  >
>  >>   >  > http://dlc.sun.com/pdf/820-3681/820-3681.pdf
>  >>   >  >
>  >>   >  > My /etc/modprobe.conf on mds/mgs/oss servers (which all have both
>  >>   >  > Ethernet and IB) includes:
>  >>   >  >
>  >>   >  > options lnet 'networks="tcp0(eth0),o2ib0(ib0)"'
>  >>   >  >
>  >>   >  > I make and mount the mdt with (which has both IB and Ethernet, subnet
>  >>   >  > 36.122.x.x is IB, 36.121.x.x is Ethernet):
>  >>   >  >
>  >>   >  > # mkfs.lustre --mdt --mgs
>  >>   >  > --mgsnode="36.122.255.201 at o2ib0,36.121.255.201 at tcp0" <... > /dev/md0
>  >>   >  > # mount -t lustre /dev/md0  /lfs/mdtb
>  >>   >  >
>  >>   >  > But, at this point, the ksocklnd module is loaded rather than the
>  >>   >  > ko2iblnd module!
>  >>   >  >
>  >>   >  > On the OSS, I make the fs w/ the same  "msgnode", but, when I try to
>  >>   >  > mount it, it correctly uses the IB interface, but can't contact the
>  >>   >  > MDS:
>  >>   >  >
>  >>   >  > LustreError: 27520:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
>  >>   >  > for MGC36.122.255.201 at o2ib_0
>  >>   >  > LustreError: 27520:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
>  >>   >  > find peer MGC36.122.255.201 at o2ib_0!
>  >>   >  > LustreError: 27520:0:(ldlm_lib.c:312:client_obd_setup()) can't add
>  >>   >  > initial connection
>  >>   >  > LustreError: 17126:0:(connection.c:142:ptlrpc_put_connection())
>  >>   >  > NULL connection
>  >>   >  > LustreError: 27520:0:(obd_config.c:325:class_setup()) setup
>  >>   >  > MGC36.122.255.201 at o2ib failed (-2)
>  >>   >  > LustreError: 27520:0:(obd_mount.c:454:lustre_start_simple())
>  >>   >  > MGC36.122.255.201 at o2ib setup error -2
>  >>   >  > LustreError: 27520:0:(obd_mount.c:1368:server_put_super()) no obd
>  >>   >  > ddnlfs-OSTffff
>  >>   >  > LustreError: 27520:0:(obd_mount.c:119:server_deregister_mount())
>  >>   >  > ddnlfs-OSTffff not registered
>  >>   >  >
>  >>   >  > It too has loaded the ksocklnd module, and not the ko2iblnd module.  I
>  >>   >  > guess that both modules should be loaded in a multihomed case?
>  >>   >  >
>  >>   >  > What am I doing wrong?
>  >>   >  >
>  >>   >  > Thanks,
>  >>   >  >
>  >>   >  > Chris
>  >>   >  > _______________________________________________
>  >>   >  > Lustre-discuss mailing list
>  >>   >  > Lustre-discuss at lists.lustre.org
>  >>   >  > http://lists.lustre.org/mailman/listinfo/lustre-discuss
>  >>   >
>  >>   >
>  >>
>  > _______________________________________________
>  > Lustre-discuss mailing list
>  > Lustre-discuss at lists.lustre.org
>  > http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>



More information about the lustre-discuss mailing list