[Lustre-discuss] Multihomed question: want Lustre over IB and Ethernet
Chris Worley
worleys at gmail.com
Fri Mar 7 09:03:17 PST 2008
On Fri, Mar 7, 2008 at 9:39 AM, Craig Prescott <prescott at hpc.ufl.edu> wrote:
>
> I think your client modprobe.conf lnet option
> should be this:
>
>
> options lnet networks=o2ib(ib0)
>
> (not 'o2ib0').
It still seems to want the TCP connection:
Lustre: Added LNI 36.122.255.1 at o2ib [8/64]
Lustre: Lustre Client File System; info at clusterfs.com
LustreError: 11043:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
for 36.121.255.201 at tcp
LustreError: 11043:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
find peer 36.121.255.201 at tcp!
LustreError: 11043:0:(ldlm_lib.c:312:client_obd_setup()) can't add
initial connection
LustreError: 11043:0:(obd_config.c:325:class_setup()) setup
ddnlfs-MDT0000-mdc-0000010430934400 failed (-2)
LustreError: 11043:0:(obd_config.c:1062:class_config_llog_handler())
Err -2 on cfg command:
LustreError: 11141:0:(connection.c:142:ptlrpc_put_connection()) NULL connection
Lustre: cmd=cf003 0:ddnlfs-MDT0000-mdc 1:ddnlfs-MDT0000_UUID
2:36.121.255.201 at tcp
LustreError: 15c-8: MGC36.122.255.201 at o2ib: The configuration from log
'ddnlfs-client' failed (-2). This may be the result of communication
errors between this node and the MGS, a bad configuration, or other
errors. See the syslog for more information.
LustreError: 11043:0:(llite_lib.c:1021:ll_fill_super()) Unable to
process log: -2
LustreError: 11043:0:(obd_config.c:392:class_cleanup()) Device 2 not setup
Lustre: client 0000010430934400 umount complete
LustreError: 11043:0:(obd_mount.c:1924:lustre_fill_super()) Unable to
mount (-2)
>
> Another thing to try, if that doesn't work lctl
> ping your MDS/MGS/OSS nids, like so:
>
> lctl ping 36.122.255.201 at o2ib
Before and after the change it looks the same:
# lctl ping 36.122.255.201 at o2ib
12345-0 at lo
12345-36.122.255.201 at o2ib
12345-36.121.255.201 at tcp
If I change my modprobe.conf to look as on the MDS/OSS's:
options lnet networks=o2ib0(ib0),tcp0(eth0)
Then, mount just specifying o2ib:
# mount -t lustre 36.122.255.201 at o2ib:/ddnlfs /lfs
It works, but, both ko2iblnd and ksocklnd are loaded.
The dmesg output is:
Lustre: OBD class driver, info at clusterfs.com
Lustre Version: 1.6.4.2
Build Version:
1.6.4.2-19691231190000-PRISTINE-.usr.src.linux-2.6.9-67.0.4.EL-Lustre-1.6.4.2
Lustre: Added LNI 36.122.255.1 at o2ib [8/64]
Lustre: Added LNI 36.121.255.1 at tcp [8/256]
Lustre: Accept secure, port 988
Lustre: Lustre Client File System; info at clusterfs.com
Lustre: ddnlfs-clilov-000001042f8b7c00.lov: set parameter stripesize=2M
Lustre: Client ddnlfs-client has started
Can I be certain it'll use IB for LFS on this client?
Thanks,
Chris
>
> Cheers,
> Craig
>
>
>
>
> Chris Worley wrote:
> > More issues. Now, on the clients.
> >
> > The MDT/MGS/OST's are all up and mounted, showing:
> >
> > # lctl list_nids
> > 36.122.255.201 at o2ib
> > 36.121.255.201 at tcp
> >
> > Now, when I go to mount on the IB-based clients, I get:
> >
> > # mount -t lustre 36.122.255.201 at o2ib:/ddnlfs /lfs
> > mount.lustre: mount 36.122.255.201 at o2ib:/ddnlfs at /lfs failed: No
> > such file or directory
> > Is the MGS specification correct?
> > Is the filesystem name correct?
> > If upgrading, is the copied client log valid? (see upgrade docs)
> >
> > The modprobe.conf contains:
> >
> > options lnet networks=o2ib0(ib0)
> >
> > And lctl looks good:
> >
> > # lctl list_nids
> > 36.122.255.1 at o2ib
> >
> > But dmesg shows that it wants to go over the 36.121.x.x (tcp) network
> > (36.12[12].255.201 is the MGS/MDS server):
> >
> > LustreError: 10001:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
> > for 36.121.255.201 at tcp
> > LustreError: 10001:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
> > find peer 36.121.255.201 at tcp!
> > LustreError: 10001:0:(ldlm_lib.c:312:client_obd_setup()) can't add
> > initial connection
> > LustreError: 9836:0:(connection.c:142:ptlrpc_put_connection()) NULL connection
> > LustreError: 10001:0:(obd_config.c:325:class_setup()) setup
> > ddnlfs-MDT0000-mdc-0000010430913c00 failed (-2)
> > LustreError: 10001:0:(obd_config.c:1062:class_config_llog_handler())
> > Err -2 on cfg command:
> > Lustre: cmd=cf003 0:ddnlfs-MDT0000-mdc 1:ddnlfs-MDT0000_UUID
> > 2:36.121.255.201 at tcp
> > LustreError: 15c-8: MGC36.122.255.201 at o2ib: The configuration from log
> > 'ddnlfs-client' failed (-2). This may be the result of communication
> > errors between this node and the MGS, a bad configuration, or other
> > errors. See the syslog for more information.
> > LustreError: 10001:0:(llite_lib.c:1021:ll_fill_super()) Unable to
> > process log: -2
> > LustreError: 10001:0:(obd_config.c:392:class_cleanup()) Device 2 not setup
> > Lustre: client 0000010430913c00 umount complete
> > LustreError: 10001:0:(obd_mount.c:1924:lustre_fill_super()) Unable to
> > mount (-2)
> >
> > Note that this setup works fine in the non-multihomed setup, so I
> > don't think ko2iblnd is to blame (the setup on the clients hasn't
> > changed at all).
> >
> > What am I doing wrong?
> >
> > Thanks,
> >
> > Chris
> > On Fri, Mar 7, 2008 at 7:41 AM, Chris Worley <worleys at gmail.com> wrote:
> >> I changed my modprobe.conf to look exactly as yours, and it worked. I
> >> hadn't been using all the quotes until the doc said to... but they may
> >> have indeed been the problem.
> >>
> >> Thanks!
> >>
> >> Chris
> >>
> >> On Fri, Mar 7, 2008 at 3:40 AM, Charles Taylor <taylor at hpc.ufl.edu> wrote:
> >> >
> >> >
> >> > Do "lclt list_nids" on your mds and oss's. They should look
> >> > something like this.
> >> >
> >> > [root at hpcmds ~]# lctl list_nids
> >> > 10.13.24.40 at o2ib
> >> > 10.13.16.40 at tcp
> >> >
> >> > Then your clients should have a nid on one or the other.
> >> >
> >> > Check your dmesg output after loading lnet. The complaints are
> >> > pretty useful. Your modprobe.conf line looks correct although we
> >> > found we did not need all the quoting so you should check that as
> >> > well. Ours looks like...
> >> >
> >> > options lnet networks=o2ib(ib0),tcp(eth0)
> >> >
> >> > My guess is that it either cannot find or does not like your ko2iblnd
> >> > module.
> >> >
> >> > ct
> >> >
> >> >
> >> >
> >> > On Mar 7, 2008, at 12:46 AM, Chris Worley wrote:
> >> >
> >> > > Most everything is over IB, but I have a few systems I'd like to mount
> >> > > the Lustre fs over GigE.
> >> > >
> >> > > I think I've followed the Multihomed instructions correctly, in:
> >> > >
> >> > > http://dlc.sun.com/pdf/820-3681/820-3681.pdf
> >> > >
> >> > > My /etc/modprobe.conf on mds/mgs/oss servers (which all have both
> >> > > Ethernet and IB) includes:
> >> > >
> >> > > options lnet 'networks="tcp0(eth0),o2ib0(ib0)"'
> >> > >
> >> > > I make and mount the mdt with (which has both IB and Ethernet, subnet
> >> > > 36.122.x.x is IB, 36.121.x.x is Ethernet):
> >> > >
> >> > > # mkfs.lustre --mdt --mgs
> >> > > --mgsnode="36.122.255.201 at o2ib0,36.121.255.201 at tcp0" <... > /dev/md0
> >> > > # mount -t lustre /dev/md0 /lfs/mdtb
> >> > >
> >> > > But, at this point, the ksocklnd module is loaded rather than the
> >> > > ko2iblnd module!
> >> > >
> >> > > On the OSS, I make the fs w/ the same "msgnode", but, when I try to
> >> > > mount it, it correctly uses the IB interface, but can't contact the
> >> > > MDS:
> >> > >
> >> > > LustreError: 27520:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
> >> > > for MGC36.122.255.201 at o2ib_0
> >> > > LustreError: 27520:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
> >> > > find peer MGC36.122.255.201 at o2ib_0!
> >> > > LustreError: 27520:0:(ldlm_lib.c:312:client_obd_setup()) can't add
> >> > > initial connection
> >> > > LustreError: 17126:0:(connection.c:142:ptlrpc_put_connection())
> >> > > NULL connection
> >> > > LustreError: 27520:0:(obd_config.c:325:class_setup()) setup
> >> > > MGC36.122.255.201 at o2ib failed (-2)
> >> > > LustreError: 27520:0:(obd_mount.c:454:lustre_start_simple())
> >> > > MGC36.122.255.201 at o2ib setup error -2
> >> > > LustreError: 27520:0:(obd_mount.c:1368:server_put_super()) no obd
> >> > > ddnlfs-OSTffff
> >> > > LustreError: 27520:0:(obd_mount.c:119:server_deregister_mount())
> >> > > ddnlfs-OSTffff not registered
> >> > >
> >> > > It too has loaded the ksocklnd module, and not the ko2iblnd module. I
> >> > > guess that both modules should be loaded in a multihomed case?
> >> > >
> >> > > What am I doing wrong?
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Chris
> >> > > _______________________________________________
> >> > > Lustre-discuss mailing list
> >> > > Lustre-discuss at lists.lustre.org
> >> > > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >> >
> >> >
> >>
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
More information about the lustre-discuss
mailing list