[Lustre-discuss] Added Dual-homed OSS; ethernet clients confused

Chris Worley worleys at gmail.com
Mon Apr 21 20:22:30 PDT 2008


The only configuration error on my OSS was: I initially only had
"o2ib0(ib0)" in my modprobe.conf.  After unmounting all the OSTs, and
getting the modprobe.conf right:

   options lnet networks=o2ib0(ib0),tcp0(eth0)

...and remounting from scratch, both ksocklnd and ko2iblnd are now
loaded properly.

But, I still can't mount the partition on the ethernet-only client nodes.

They get the error:

LustreError: 8439:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
for 36.102.29.4 at o2ib
LustreError: 8439:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
find peer 36.102.29.4 at o2ib!
LustreError: 8439:0:(ldlm_lib.c:312:client_obd_setup()) can't add
initial connection
LustreError: 8439:0:(obd_config.c:325:class_setup()) setup
lfs-OST0026-osc-0000010753919000 failed (-2)
LustreError: 8439:0:(obd_config.c:1062:class_config_llog_handler())
Err -2 on cfg command:
Lustre:    cmd=cf003 0:lfs-OST0026-osc  1:lfs-OST0026_UUID  2:36.102.29.4 at o2ib
LustreError: 15c-8: MGC36.101.29.1 at tcp: The configuration from log
'lfs-client' failed (-2).

The 36.102.29.4 is the IPoIB address of the added OSS.  It shouldn't
want it "@o2ib".

I've also unmounted all Lustre mounts on the MGS/MDS, unloaded all the
modules and remounted.  Still no joy.

The file systems were created on the new OST, just as on all the others:

for i  in b c d e f g h i j k l; do mkfs.lustre --ost
--mgsnode="36.102.29.1 at o2ib0,36.101.29.1 at tcp0" --fsname=lfs --param
sys.timeout=40 --param lov.stripesize=2M /dev/sd$i & done

The client has the right modprobe.conf, which worked before the additional OST:

  options lnet networks=tcp0(eth0)

... and I'm using the same mount command that worked previously:

  mount -t lustre 36.101.29.1 at tcp:/lfs /lfs

>From the OST I can ping the client:

# lctl list_nids
36.102.29.4 at o2ib
36.101.29.4 at tcp
# lctl ping 36.101.255.10 at tcp
12345-0 at lo
12345-36.101.255.10 at tcp

>From the client, I can ping the OST and MDS/MGS:

# lctl list_nids
36.101.255.10 at tcp
# lctl ping 36.101.29.4 at tcp
12345-0 at lo
12345-36.102.29.4 at o2ib
12345-36.101.29.4 at tcp
# lctl ping 36.101.29.1 at tcp
12345-0 at lo
12345-36.102.29.1 at o2ib
12345-36.101.29.1 at tcp

So, somehow, not having the right modprobe.conf the first time I
mounted the partitions on the new OST has made it permanently not want
to mount properly on Ethernet clients (it mounts fine on IB clients).

Any ideas?

Thanks,

Chris



More information about the lustre-discuss mailing list