[Lustre-discuss] Added Dual-homed OSS; ethernet clients confused

Cliff White Cliff.White at Sun.COM
Tue Apr 22 13:21:59 PDT 2008


Chris Worley wrote:
> Does anybody have any clues, or do I need to rebuild the entire FS from scratch?

First, what is in your client modprobe.conf? Should only be 'tcp' for 
tcp-only clients.
Second, I don't think you can use an ipoib address as a tcp connection.
If it's ipoib, LNET is going to use o2ib.

cliffw

> 
> On Mon, Apr 21, 2008 at 9:31 PM, Chris Worley <worleys at gmail.com> wrote:
>> On Mon, Apr 21, 2008 at 9:22 PM, Chris Worley <worleys at gmail.com> wrote:
>>  > The only configuration error on my OSS was: I initially only had
>>  >  "o2ib0(ib0)" in my modprobe.conf.  After unmounting all the OSTs, and
>>  >  getting the modprobe.conf right:
>>  >
>>  >    options lnet networks=o2ib0(ib0),tcp0(eth0)
>>  >
>>  >  ...and remounting from scratch, both ksocklnd and ko2iblnd are now
>>  >  loaded properly.
>>  >
>>  >  But, I still can't mount the partition on the ethernet-only client nodes.
>>  >
>>  >  They get the error:
>>  >
>>  >  LustreError: 8439:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
>>  >  for 36.102.29.4 at o2ib
>>  >  LustreError: 8439:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
>>  >  find peer 36.102.29.4 at o2ib!
>>  >  LustreError: 8439:0:(ldlm_lib.c:312:client_obd_setup()) can't add
>>  >  initial connection
>>  >  LustreError: 8439:0:(obd_config.c:325:class_setup()) setup
>>  >  lfs-OST0026-osc-0000010753919000 failed (-2)
>>  >  LustreError: 8439:0:(obd_config.c:1062:class_config_llog_handler())
>>  >  Err -2 on cfg command:
>>  >  Lustre:    cmd=cf003 0:lfs-OST0026-osc  1:lfs-OST0026_UUID  2:36.102.29.4 at o2ib
>>  >  LustreError: 15c-8: MGC36.101.29.1 at tcp: The configuration from log
>>  >  'lfs-client' failed (-2).
>>  >
>>  >  The 36.102.29.4 is the IPoIB address of the added OSS.  It shouldn't
>>  >  want it "@o2ib".
>>  >
>>  >  I've also unmounted all Lustre mounts on the MGS/MDS, unloaded all the
>>  >  modules and remounted.  Still no joy.
>>  >
>>
>>  From this point forward, every time I say"OST" I mean "OSS"...
>>
>>
>>
>>  >  The file systems were created on the new OST, just as on all the others:
>>  >
>>  >  for i  in b c d e f g h i j k l; do mkfs.lustre --ost
>>  >  --mgsnode="36.102.29.1 at o2ib0,36.101.29.1 at tcp0" --fsname=lfs --param
>>  >  sys.timeout=40 --param lov.stripesize=2M /dev/sd$i & done
>>  >
>>  >  The client has the right modprobe.conf, which worked before the additional OST:
>>  >
>>  >   options lnet networks=tcp0(eth0)
>>  >
>>  >  ... and I'm using the same mount command that worked previously:
>>  >
>>  >   mount -t lustre 36.101.29.1 at tcp:/lfs /lfs
>>  >
>>  >  From the OST I can ping the client:
>>  >
>>  >  # lctl list_nids
>>  >  36.102.29.4 at o2ib
>>  >  36.101.29.4 at tcp
>>  >  # lctl ping 36.101.255.10 at tcp
>>  >  12345-0 at lo
>>  >  12345-36.101.255.10 at tcp
>>  >
>>  >  From the client, I can ping the OST and MDS/MGS:
>>  >
>>  >  # lctl list_nids
>>  >  36.101.255.10 at tcp
>>  >  # lctl ping 36.101.29.4 at tcp
>>  >  12345-0 at lo
>>  >  12345-36.102.29.4 at o2ib
>>  >  12345-36.101.29.4 at tcp
>>  >  # lctl ping 36.101.29.1 at tcp
>>  >  12345-0 at lo
>>  >  12345-36.102.29.1 at o2ib
>>  >  12345-36.101.29.1 at tcp
>>  >
>>  >  So, somehow, not having the right modprobe.conf the first time I
>>  >  mounted the partitions on the new OST has made it permanently not want
>>  >  to mount properly on Ethernet clients (it mounts fine on IB clients).
>>  >
>>  >  Any ideas?
>>  >
>>  >  Thanks,
>>  >
>>  >  Chris
>>  >
>>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss




More information about the lustre-discuss mailing list