[Lustre-discuss] Added Dual-homed OSS; ethernet clients confused
Chris Worley
worleys at gmail.com
Tue Apr 22 13:44:49 PDT 2008
On Tue, Apr 22, 2008 at 2:21 PM, Cliff White <Cliff.White at sun.com> wrote:
> Chris Worley wrote:
>
> > Does anybody have any clues, or do I need to rebuild the entire FS from
> scratch?
> >
>
> First, what is in your client modprobe.conf? Should only be 'tcp' for
> tcp-only clients.
It is/was:
options lnet networks=tcp0(eth0)
... and this worked fine before I added the new OSS.
> Second, I don't think you can use an ipoib address as a tcp connection.
> If it's ipoib, LNET is going to use o2ib.
I don't quite follow.
The specific client doesn't have IB.
The IPoIB addresses in the network are 36.102.x.x.
The Ethernet addresses in the network are: 36.101.x.x.
Both are 16 bit class masks.
The only place I use IPoIB addresses are in the file system creation
on the OSSes, as in:
for i in b c d e f g h i j k l; do mkfs.lustre --ost
--mgsnode="36.102.29.1 at o2ib0,36.101.29.1 at tcp0" --fsname=lfs --param
sys.timeout=40 --param lov.stripesize=2M /dev/sd$i & done
... and that has worked well, up until I added another OSS. Did I do
something wrong?
The only thing I know I did wrong was, when I first mounted the
created file systems, I had my new OSS'es modprobe.conf set for IB
only:
options lnet networks=o2ib(ib0)
I changed that to be the same as my existing OSSes:
options lnet networks=o2ib0(ib0),tcp0(eth0)
...after I realized my Ethernet-only clients weren't working, and
reloaded everything from scratch (at this point, I have unmounted all
clients, unmounted all luster OST/MDT file systems on the servers,
removed all Lustre modules from all clients and servers, rebooted the
Ethernet client, then remounted all the file systems everywhere... but
still no joy on the Ethernet-only clients).
At this point I'm guessing that when I made the file systems on the
new OSS, even though I had properly set:
--mgsnode="36.102.29.1 at o2ib0,36.101.29.1 at tcp0"
...in the mkfs, the incorrectly set modprobe.conf screwed this mkfs up
irrevocably, and since the file system has been in use from IB clients
after adding the new OSS, my only recourse is to 1) backup the file
system, and 2) rebuild everything (all OSTs and the MDT) from scratch
(mkfs) on all OSS'es and the MDS.
Is that correct?
Thanks,
Chris
>
> cliffw
>
>
> >
> >
> >
> >
> > On Mon, Apr 21, 2008 at 9:31 PM, Chris Worley <worleys at gmail.com> wrote:
> >
> > > On Mon, Apr 21, 2008 at 9:22 PM, Chris Worley <worleys at gmail.com> wrote:
> > > > The only configuration error on my OSS was: I initially only had
> > > > "o2ib0(ib0)" in my modprobe.conf. After unmounting all the OSTs,
> and
> > > > getting the modprobe.conf right:
> > > >
> > > > options lnet networks=o2ib0(ib0),tcp0(eth0)
> > > >
> > > > ...and remounting from scratch, both ksocklnd and ko2iblnd are now
> > > > loaded properly.
> > > >
> > > > But, I still can't mount the partition on the ethernet-only client
> nodes.
> > > >
> > > > They get the error:
> > > >
> > > > LustreError: 8439:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID
> found
> > > > for 36.102.29.4 at o2ib
> > > > LustreError: 8439:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
> > > > find peer 36.102.29.4 at o2ib!
> > > > LustreError: 8439:0:(ldlm_lib.c:312:client_obd_setup()) can't add
> > > > initial connection
> > > > LustreError: 8439:0:(obd_config.c:325:class_setup()) setup
> > > > lfs-OST0026-osc-0000010753919000 failed (-2)
> > > > LustreError: 8439:0:(obd_config.c:1062:class_config_llog_handler())
> > > > Err -2 on cfg command:
> > > > Lustre: cmd=cf003 0:lfs-OST0026-osc 1:lfs-OST0026_UUID
> 2:36.102.29.4 at o2ib
> > > > LustreError: 15c-8: MGC36.101.29.1 at tcp: The configuration from log
> > > > 'lfs-client' failed (-2).
> > > >
> > > > The 36.102.29.4 is the IPoIB address of the added OSS. It shouldn't
> > > > want it "@o2ib".
> > > >
> > > > I've also unmounted all Lustre mounts on the MGS/MDS, unloaded all
> the
> > > > modules and remounted. Still no joy.
> > > >
> > >
> > > From this point forward, every time I say"OST" I mean "OSS"...
> > >
> > >
> > >
> > > > The file systems were created on the new OST, just as on all the
> others:
> > > >
> > > > for i in b c d e f g h i j k l; do mkfs.lustre --ost
> > > > --mgsnode="36.102.29.1 at o2ib0,36.101.29.1 at tcp0" --fsname=lfs --param
> > > > sys.timeout=40 --param lov.stripesize=2M /dev/sd$i & done
> > > >
> > > > The client has the right modprobe.conf, which worked before the
> additional OST:
> > > >
> > > > options lnet networks=tcp0(eth0)
> > > >
> > > > ... and I'm using the same mount command that worked previously:
> > > >
> > > > mount -t lustre 36.101.29.1 at tcp:/lfs /lfs
> > > >
> > > > From the OST I can ping the client:
> > > >
> > > > # lctl list_nids
> > > > 36.102.29.4 at o2ib
> > > > 36.101.29.4 at tcp
> > > > # lctl ping 36.101.255.10 at tcp
> > > > 12345-0 at lo
> > > > 12345-36.101.255.10 at tcp
> > > >
> > > > From the client, I can ping the OST and MDS/MGS:
> > > >
> > > > # lctl list_nids
> > > > 36.101.255.10 at tcp
> > > > # lctl ping 36.101.29.4 at tcp
> > > > 12345-0 at lo
> > > > 12345-36.102.29.4 at o2ib
> > > > 12345-36.101.29.4 at tcp
> > > > # lctl ping 36.101.29.1 at tcp
> > > > 12345-0 at lo
> > > > 12345-36.102.29.1 at o2ib
> > > > 12345-36.101.29.1 at tcp
> > > >
> > > > So, somehow, not having the right modprobe.conf the first time I
> > > > mounted the partitions on the new OST has made it permanently not
> want
> > > > to mount properly on Ethernet clients (it mounts fine on IB
> clients).
> > > >
> > > > Any ideas?
> > > >
> > > > Thanks,
> > > >
> > > > Chris
> > > >
> > >
> > >
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >
>
>
More information about the lustre-discuss
mailing list