[Lustre-discuss] Added Dual-homed OSS; ethernet clients confused

Chris Worley worleys at gmail.com
Tue Apr 22 13:44:49 PDT 2008


On Tue, Apr 22, 2008 at 2:21 PM, Cliff White <Cliff.White at sun.com> wrote:
 > Chris Worley wrote:
 >
 > > Does anybody have any clues, or do I need to rebuild the entire FS from
 > scratch?
 > >
 >
 >  First, what is in your client modprobe.conf? Should only be 'tcp' for
 > tcp-only clients.
 It is/was:


  options lnet networks=tcp0(eth0)

 ... and this worked fine before I added the new OSS.


 >  Second, I don't think you can use an ipoib address as a tcp connection.
 >  If it's ipoib, LNET is going to use o2ib.

 I don't quite follow.

 The specific client doesn't have IB.

 The IPoIB addresses in the network are 36.102.x.x.

 The Ethernet addresses in the network are: 36.101.x.x.

 Both are 16 bit class masks.

 The only place I use IPoIB addresses are in the file system creation
 on the OSSes, as in:


 for i  in b c d e f g h i j k l; do mkfs.lustre --ost
 --mgsnode="36.102.29.1 at o2ib0,36.101.29.1 at tcp0" --fsname=lfs --param
 sys.timeout=40 --param lov.stripesize=2M /dev/sd$i & done

 ... and that has worked well, up until I added another OSS.  Did I do
 something wrong?

 The only thing I know I did wrong was, when I first mounted the
 created file systems, I had my new OSS'es modprobe.conf set for IB
 only:

   options lnet networks=o2ib(ib0)

 I changed that to be the same as my existing OSSes:


   options lnet networks=o2ib0(ib0),tcp0(eth0)

 ...after I realized my Ethernet-only clients weren't working, and
 reloaded everything from scratch (at this point, I have unmounted all
 clients, unmounted all luster OST/MDT file systems on the servers,
 removed all Lustre modules from all clients and servers, rebooted the
 Ethernet client, then remounted all the file systems everywhere... but
 still no joy on the Ethernet-only clients).

 At this point I'm guessing that when I made the file systems on the
 new OSS, even though I had properly set:


  --mgsnode="36.102.29.1 at o2ib0,36.101.29.1 at tcp0"

 ...in the mkfs, the incorrectly set modprobe.conf screwed this mkfs up
 irrevocably, and since the file system has been in use from IB clients
 after adding the new OSS, my only recourse is to 1) backup the file
 system, and 2) rebuild everything (all OSTs and the MDT) from scratch
 (mkfs) on all OSS'es and the MDS.

 Is that correct?

 Thanks,

 Chris


>
 >  cliffw
 >
 >
 > >
 > >
 > >
 > >
 > > On Mon, Apr 21, 2008 at 9:31 PM, Chris Worley <worleys at gmail.com> wrote:
 > >
 > > > On Mon, Apr 21, 2008 at 9:22 PM, Chris Worley <worleys at gmail.com> wrote:
 > > >  > The only configuration error on my OSS was: I initially only had
 > > >  >  "o2ib0(ib0)" in my modprobe.conf.  After unmounting all the OSTs,
 > and
 > > >  >  getting the modprobe.conf right:
 > > >  >
 > > >  >    options lnet networks=o2ib0(ib0),tcp0(eth0)
 > > >  >
 > > >  >  ...and remounting from scratch, both ksocklnd and ko2iblnd are now
 > > >  >  loaded properly.
 > > >  >
 > > >  >  But, I still can't mount the partition on the ethernet-only client
 > nodes.
 > > >  >
 > > >  >  They get the error:
 > > >  >
 > > >  >  LustreError: 8439:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID
 > found
 > > >  >  for 36.102.29.4 at o2ib
 > > >  >  LustreError: 8439:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
 > > >  >  find peer 36.102.29.4 at o2ib!
 > > >  >  LustreError: 8439:0:(ldlm_lib.c:312:client_obd_setup()) can't add
 > > >  >  initial connection
 > > >  >  LustreError: 8439:0:(obd_config.c:325:class_setup()) setup
 > > >  >  lfs-OST0026-osc-0000010753919000 failed (-2)
 > > >  >  LustreError: 8439:0:(obd_config.c:1062:class_config_llog_handler())
 > > >  >  Err -2 on cfg command:
 > > >  >  Lustre:    cmd=cf003 0:lfs-OST0026-osc  1:lfs-OST0026_UUID
 > 2:36.102.29.4 at o2ib
 > > >  >  LustreError: 15c-8: MGC36.101.29.1 at tcp: The configuration from log
 > > >  >  'lfs-client' failed (-2).
 > > >  >
 > > >  >  The 36.102.29.4 is the IPoIB address of the added OSS.  It shouldn't
 > > >  >  want it "@o2ib".
 > > >  >
 > > >  >  I've also unmounted all Lustre mounts on the MGS/MDS, unloaded all
 > the
 > > >  >  modules and remounted.  Still no joy.
 > > >  >
 > > >
 > > >  From this point forward, every time I say"OST" I mean "OSS"...
 > > >
 > > >
 > > >
 > > >  >  The file systems were created on the new OST, just as on all the
 > others:
 > > >  >
 > > >  >  for i  in b c d e f g h i j k l; do mkfs.lustre --ost
 > > >  >  --mgsnode="36.102.29.1 at o2ib0,36.101.29.1 at tcp0" --fsname=lfs --param
 > > >  >  sys.timeout=40 --param lov.stripesize=2M /dev/sd$i & done
 > > >  >
 > > >  >  The client has the right modprobe.conf, which worked before the
 > additional OST:
 > > >  >
 > > >  >   options lnet networks=tcp0(eth0)
 > > >  >
 > > >  >  ... and I'm using the same mount command that worked previously:
 > > >  >
 > > >  >   mount -t lustre 36.101.29.1 at tcp:/lfs /lfs
 > > >  >
 > > >  >  From the OST I can ping the client:
 > > >  >
 > > >  >  # lctl list_nids
 > > >  >  36.102.29.4 at o2ib
 > > >  >  36.101.29.4 at tcp
 > > >  >  # lctl ping 36.101.255.10 at tcp
 > > >  >  12345-0 at lo
 > > >  >  12345-36.101.255.10 at tcp
 > > >  >
 > > >  >  From the client, I can ping the OST and MDS/MGS:
 > > >  >
 > > >  >  # lctl list_nids
 > > >  >  36.101.255.10 at tcp
 > > >  >  # lctl ping 36.101.29.4 at tcp
 > > >  >  12345-0 at lo
 > > >  >  12345-36.102.29.4 at o2ib
 > > >  >  12345-36.101.29.4 at tcp
 > > >  >  # lctl ping 36.101.29.1 at tcp
 > > >  >  12345-0 at lo
 > > >  >  12345-36.102.29.1 at o2ib
 > > >  >  12345-36.101.29.1 at tcp
 > > >  >
 > > >  >  So, somehow, not having the right modprobe.conf the first time I
 > > >  >  mounted the partitions on the new OST has made it permanently not
 > want
 > > >  >  to mount properly on Ethernet clients (it mounts fine on IB
 > clients).
 > > >  >
 > > >  >  Any ideas?
 > > >  >
 > > >  >  Thanks,
 > > >  >
 > > >  >  Chris
 > > >  >
 > > >
 > > >
 > > _______________________________________________
 > > Lustre-discuss mailing list
 > > Lustre-discuss at lists.lustre.org
 > > http://lists.lustre.org/mailman/listinfo/lustre-discuss
 > >
 >
 >



More information about the lustre-discuss mailing list