[Lustre-discuss] Added Dual-homed OSS; ethernet clients confused
Chris Worley
worleys at gmail.com
Tue Apr 22 10:36:46 PDT 2008
Does anybody have any clues, or do I need to rebuild the entire FS from scratch?
On Mon, Apr 21, 2008 at 9:31 PM, Chris Worley <worleys at gmail.com> wrote:
>
> On Mon, Apr 21, 2008 at 9:22 PM, Chris Worley <worleys at gmail.com> wrote:
> > The only configuration error on my OSS was: I initially only had
> > "o2ib0(ib0)" in my modprobe.conf. After unmounting all the OSTs, and
> > getting the modprobe.conf right:
> >
> > options lnet networks=o2ib0(ib0),tcp0(eth0)
> >
> > ...and remounting from scratch, both ksocklnd and ko2iblnd are now
> > loaded properly.
> >
> > But, I still can't mount the partition on the ethernet-only client nodes.
> >
> > They get the error:
> >
> > LustreError: 8439:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
> > for 36.102.29.4 at o2ib
> > LustreError: 8439:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
> > find peer 36.102.29.4 at o2ib!
> > LustreError: 8439:0:(ldlm_lib.c:312:client_obd_setup()) can't add
> > initial connection
> > LustreError: 8439:0:(obd_config.c:325:class_setup()) setup
> > lfs-OST0026-osc-0000010753919000 failed (-2)
> > LustreError: 8439:0:(obd_config.c:1062:class_config_llog_handler())
> > Err -2 on cfg command:
> > Lustre: cmd=cf003 0:lfs-OST0026-osc 1:lfs-OST0026_UUID 2:36.102.29.4 at o2ib
> > LustreError: 15c-8: MGC36.101.29.1 at tcp: The configuration from log
> > 'lfs-client' failed (-2).
> >
> > The 36.102.29.4 is the IPoIB address of the added OSS. It shouldn't
> > want it "@o2ib".
> >
> > I've also unmounted all Lustre mounts on the MGS/MDS, unloaded all the
> > modules and remounted. Still no joy.
> >
>
> From this point forward, every time I say"OST" I mean "OSS"...
>
>
>
> > The file systems were created on the new OST, just as on all the others:
> >
> > for i in b c d e f g h i j k l; do mkfs.lustre --ost
> > --mgsnode="36.102.29.1 at o2ib0,36.101.29.1 at tcp0" --fsname=lfs --param
> > sys.timeout=40 --param lov.stripesize=2M /dev/sd$i & done
> >
> > The client has the right modprobe.conf, which worked before the additional OST:
> >
> > options lnet networks=tcp0(eth0)
> >
> > ... and I'm using the same mount command that worked previously:
> >
> > mount -t lustre 36.101.29.1 at tcp:/lfs /lfs
> >
> > From the OST I can ping the client:
> >
> > # lctl list_nids
> > 36.102.29.4 at o2ib
> > 36.101.29.4 at tcp
> > # lctl ping 36.101.255.10 at tcp
> > 12345-0 at lo
> > 12345-36.101.255.10 at tcp
> >
> > From the client, I can ping the OST and MDS/MGS:
> >
> > # lctl list_nids
> > 36.101.255.10 at tcp
> > # lctl ping 36.101.29.4 at tcp
> > 12345-0 at lo
> > 12345-36.102.29.4 at o2ib
> > 12345-36.101.29.4 at tcp
> > # lctl ping 36.101.29.1 at tcp
> > 12345-0 at lo
> > 12345-36.102.29.1 at o2ib
> > 12345-36.101.29.1 at tcp
> >
> > So, somehow, not having the right modprobe.conf the first time I
> > mounted the partitions on the new OST has made it permanently not want
> > to mount properly on Ethernet clients (it mounts fine on IB clients).
> >
> > Any ideas?
> >
> > Thanks,
> >
> > Chris
> >
>
More information about the lustre-discuss
mailing list