[Lustre-discuss] Multihomed question: want Lustre over IB and Ethernet
Chris Worley
worleys at gmail.com
Fri Mar 7 08:34:43 PST 2008
More issues. Now, on the clients.
The MDT/MGS/OST's are all up and mounted, showing:
# lctl list_nids
36.122.255.201 at o2ib
36.121.255.201 at tcp
Now, when I go to mount on the IB-based clients, I get:
# mount -t lustre 36.122.255.201 at o2ib:/ddnlfs /lfs
mount.lustre: mount 36.122.255.201 at o2ib:/ddnlfs at /lfs failed: No
such file or directory
Is the MGS specification correct?
Is the filesystem name correct?
If upgrading, is the copied client log valid? (see upgrade docs)
The modprobe.conf contains:
options lnet networks=o2ib0(ib0)
And lctl looks good:
# lctl list_nids
36.122.255.1 at o2ib
But dmesg shows that it wants to go over the 36.121.x.x (tcp) network
(36.12[12].255.201 is the MGS/MDS server):
LustreError: 10001:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
for 36.121.255.201 at tcp
LustreError: 10001:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
find peer 36.121.255.201 at tcp!
LustreError: 10001:0:(ldlm_lib.c:312:client_obd_setup()) can't add
initial connection
LustreError: 9836:0:(connection.c:142:ptlrpc_put_connection()) NULL connection
LustreError: 10001:0:(obd_config.c:325:class_setup()) setup
ddnlfs-MDT0000-mdc-0000010430913c00 failed (-2)
LustreError: 10001:0:(obd_config.c:1062:class_config_llog_handler())
Err -2 on cfg command:
Lustre: cmd=cf003 0:ddnlfs-MDT0000-mdc 1:ddnlfs-MDT0000_UUID
2:36.121.255.201 at tcp
LustreError: 15c-8: MGC36.122.255.201 at o2ib: The configuration from log
'ddnlfs-client' failed (-2). This may be the result of communication
errors between this node and the MGS, a bad configuration, or other
errors. See the syslog for more information.
LustreError: 10001:0:(llite_lib.c:1021:ll_fill_super()) Unable to
process log: -2
LustreError: 10001:0:(obd_config.c:392:class_cleanup()) Device 2 not setup
Lustre: client 0000010430913c00 umount complete
LustreError: 10001:0:(obd_mount.c:1924:lustre_fill_super()) Unable to
mount (-2)
Note that this setup works fine in the non-multihomed setup, so I
don't think ko2iblnd is to blame (the setup on the clients hasn't
changed at all).
What am I doing wrong?
Thanks,
Chris
On Fri, Mar 7, 2008 at 7:41 AM, Chris Worley <worleys at gmail.com> wrote:
>
> I changed my modprobe.conf to look exactly as yours, and it worked. I
> hadn't been using all the quotes until the doc said to... but they may
> have indeed been the problem.
>
> Thanks!
>
> Chris
>
> On Fri, Mar 7, 2008 at 3:40 AM, Charles Taylor <taylor at hpc.ufl.edu> wrote:
> >
> >
> > Do "lclt list_nids" on your mds and oss's. They should look
> > something like this.
> >
> > [root at hpcmds ~]# lctl list_nids
> > 10.13.24.40 at o2ib
> > 10.13.16.40 at tcp
> >
> > Then your clients should have a nid on one or the other.
> >
> > Check your dmesg output after loading lnet. The complaints are
> > pretty useful. Your modprobe.conf line looks correct although we
> > found we did not need all the quoting so you should check that as
> > well. Ours looks like...
> >
> > options lnet networks=o2ib(ib0),tcp(eth0)
> >
> > My guess is that it either cannot find or does not like your ko2iblnd
> > module.
> >
> > ct
> >
> >
> >
> > On Mar 7, 2008, at 12:46 AM, Chris Worley wrote:
> >
> > > Most everything is over IB, but I have a few systems I'd like to mount
> > > the Lustre fs over GigE.
> > >
> > > I think I've followed the Multihomed instructions correctly, in:
> > >
> > > http://dlc.sun.com/pdf/820-3681/820-3681.pdf
> > >
> > > My /etc/modprobe.conf on mds/mgs/oss servers (which all have both
> > > Ethernet and IB) includes:
> > >
> > > options lnet 'networks="tcp0(eth0),o2ib0(ib0)"'
> > >
> > > I make and mount the mdt with (which has both IB and Ethernet, subnet
> > > 36.122.x.x is IB, 36.121.x.x is Ethernet):
> > >
> > > # mkfs.lustre --mdt --mgs
> > > --mgsnode="36.122.255.201 at o2ib0,36.121.255.201 at tcp0" <... > /dev/md0
> > > # mount -t lustre /dev/md0 /lfs/mdtb
> > >
> > > But, at this point, the ksocklnd module is loaded rather than the
> > > ko2iblnd module!
> > >
> > > On the OSS, I make the fs w/ the same "msgnode", but, when I try to
> > > mount it, it correctly uses the IB interface, but can't contact the
> > > MDS:
> > >
> > > LustreError: 27520:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
> > > for MGC36.122.255.201 at o2ib_0
> > > LustreError: 27520:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
> > > find peer MGC36.122.255.201 at o2ib_0!
> > > LustreError: 27520:0:(ldlm_lib.c:312:client_obd_setup()) can't add
> > > initial connection
> > > LustreError: 17126:0:(connection.c:142:ptlrpc_put_connection())
> > > NULL connection
> > > LustreError: 27520:0:(obd_config.c:325:class_setup()) setup
> > > MGC36.122.255.201 at o2ib failed (-2)
> > > LustreError: 27520:0:(obd_mount.c:454:lustre_start_simple())
> > > MGC36.122.255.201 at o2ib setup error -2
> > > LustreError: 27520:0:(obd_mount.c:1368:server_put_super()) no obd
> > > ddnlfs-OSTffff
> > > LustreError: 27520:0:(obd_mount.c:119:server_deregister_mount())
> > > ddnlfs-OSTffff not registered
> > >
> > > It too has loaded the ksocklnd module, and not the ko2iblnd module. I
> > > guess that both modules should be loaded in a multihomed case?
> > >
> > > What am I doing wrong?
> > >
> > > Thanks,
> > >
> > > Chris
> > > _______________________________________________
> > > Lustre-discuss mailing list
> > > Lustre-discuss at lists.lustre.org
> > > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >
> >
>
More information about the lustre-discuss
mailing list