[Lustre-discuss] Added Dual-homed OSS; ethernet clients confused

Chris Worley worleys at gmail.com
Tue Apr 22 10:36:46 PDT 2008


Does anybody have any clues, or do I need to rebuild the entire FS from scratch?

On Mon, Apr 21, 2008 at 9:31 PM, Chris Worley <worleys at gmail.com> wrote:
>
> On Mon, Apr 21, 2008 at 9:22 PM, Chris Worley <worleys at gmail.com> wrote:
>  > The only configuration error on my OSS was: I initially only had
>  >  "o2ib0(ib0)" in my modprobe.conf.  After unmounting all the OSTs, and
>  >  getting the modprobe.conf right:
>  >
>  >    options lnet networks=o2ib0(ib0),tcp0(eth0)
>  >
>  >  ...and remounting from scratch, both ksocklnd and ko2iblnd are now
>  >  loaded properly.
>  >
>  >  But, I still can't mount the partition on the ethernet-only client nodes.
>  >
>  >  They get the error:
>  >
>  >  LustreError: 8439:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
>  >  for 36.102.29.4 at o2ib
>  >  LustreError: 8439:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
>  >  find peer 36.102.29.4 at o2ib!
>  >  LustreError: 8439:0:(ldlm_lib.c:312:client_obd_setup()) can't add
>  >  initial connection
>  >  LustreError: 8439:0:(obd_config.c:325:class_setup()) setup
>  >  lfs-OST0026-osc-0000010753919000 failed (-2)
>  >  LustreError: 8439:0:(obd_config.c:1062:class_config_llog_handler())
>  >  Err -2 on cfg command:
>  >  Lustre:    cmd=cf003 0:lfs-OST0026-osc  1:lfs-OST0026_UUID  2:36.102.29.4 at o2ib
>  >  LustreError: 15c-8: MGC36.101.29.1 at tcp: The configuration from log
>  >  'lfs-client' failed (-2).
>  >
>  >  The 36.102.29.4 is the IPoIB address of the added OSS.  It shouldn't
>  >  want it "@o2ib".
>  >
>  >  I've also unmounted all Lustre mounts on the MGS/MDS, unloaded all the
>  >  modules and remounted.  Still no joy.
>  >
>
>  From this point forward, every time I say"OST" I mean "OSS"...
>
>
>
>  >  The file systems were created on the new OST, just as on all the others:
>  >
>  >  for i  in b c d e f g h i j k l; do mkfs.lustre --ost
>  >  --mgsnode="36.102.29.1 at o2ib0,36.101.29.1 at tcp0" --fsname=lfs --param
>  >  sys.timeout=40 --param lov.stripesize=2M /dev/sd$i & done
>  >
>  >  The client has the right modprobe.conf, which worked before the additional OST:
>  >
>  >   options lnet networks=tcp0(eth0)
>  >
>  >  ... and I'm using the same mount command that worked previously:
>  >
>  >   mount -t lustre 36.101.29.1 at tcp:/lfs /lfs
>  >
>  >  From the OST I can ping the client:
>  >
>  >  # lctl list_nids
>  >  36.102.29.4 at o2ib
>  >  36.101.29.4 at tcp
>  >  # lctl ping 36.101.255.10 at tcp
>  >  12345-0 at lo
>  >  12345-36.101.255.10 at tcp
>  >
>  >  From the client, I can ping the OST and MDS/MGS:
>  >
>  >  # lctl list_nids
>  >  36.101.255.10 at tcp
>  >  # lctl ping 36.101.29.4 at tcp
>  >  12345-0 at lo
>  >  12345-36.102.29.4 at o2ib
>  >  12345-36.101.29.4 at tcp
>  >  # lctl ping 36.101.29.1 at tcp
>  >  12345-0 at lo
>  >  12345-36.102.29.1 at o2ib
>  >  12345-36.101.29.1 at tcp
>  >
>  >  So, somehow, not having the right modprobe.conf the first time I
>  >  mounted the partitions on the new OST has made it permanently not want
>  >  to mount properly on Ethernet clients (it mounts fine on IB clients).
>  >
>  >  Any ideas?
>  >
>  >  Thanks,
>  >
>  >  Chris
>  >
>



More information about the lustre-discuss mailing list