[Lustre-discuss] Added Dual-homed OSS; ethernet clients confused
Cliff White
Cliff.White at Sun.COM
Tue Apr 22 13:21:59 PDT 2008
Chris Worley wrote:
> Does anybody have any clues, or do I need to rebuild the entire FS from scratch?
First, what is in your client modprobe.conf? Should only be 'tcp' for
tcp-only clients.
Second, I don't think you can use an ipoib address as a tcp connection.
If it's ipoib, LNET is going to use o2ib.
cliffw
>
> On Mon, Apr 21, 2008 at 9:31 PM, Chris Worley <worleys at gmail.com> wrote:
>> On Mon, Apr 21, 2008 at 9:22 PM, Chris Worley <worleys at gmail.com> wrote:
>> > The only configuration error on my OSS was: I initially only had
>> > "o2ib0(ib0)" in my modprobe.conf. After unmounting all the OSTs, and
>> > getting the modprobe.conf right:
>> >
>> > options lnet networks=o2ib0(ib0),tcp0(eth0)
>> >
>> > ...and remounting from scratch, both ksocklnd and ko2iblnd are now
>> > loaded properly.
>> >
>> > But, I still can't mount the partition on the ethernet-only client nodes.
>> >
>> > They get the error:
>> >
>> > LustreError: 8439:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
>> > for 36.102.29.4 at o2ib
>> > LustreError: 8439:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
>> > find peer 36.102.29.4 at o2ib!
>> > LustreError: 8439:0:(ldlm_lib.c:312:client_obd_setup()) can't add
>> > initial connection
>> > LustreError: 8439:0:(obd_config.c:325:class_setup()) setup
>> > lfs-OST0026-osc-0000010753919000 failed (-2)
>> > LustreError: 8439:0:(obd_config.c:1062:class_config_llog_handler())
>> > Err -2 on cfg command:
>> > Lustre: cmd=cf003 0:lfs-OST0026-osc 1:lfs-OST0026_UUID 2:36.102.29.4 at o2ib
>> > LustreError: 15c-8: MGC36.101.29.1 at tcp: The configuration from log
>> > 'lfs-client' failed (-2).
>> >
>> > The 36.102.29.4 is the IPoIB address of the added OSS. It shouldn't
>> > want it "@o2ib".
>> >
>> > I've also unmounted all Lustre mounts on the MGS/MDS, unloaded all the
>> > modules and remounted. Still no joy.
>> >
>>
>> From this point forward, every time I say"OST" I mean "OSS"...
>>
>>
>>
>> > The file systems were created on the new OST, just as on all the others:
>> >
>> > for i in b c d e f g h i j k l; do mkfs.lustre --ost
>> > --mgsnode="36.102.29.1 at o2ib0,36.101.29.1 at tcp0" --fsname=lfs --param
>> > sys.timeout=40 --param lov.stripesize=2M /dev/sd$i & done
>> >
>> > The client has the right modprobe.conf, which worked before the additional OST:
>> >
>> > options lnet networks=tcp0(eth0)
>> >
>> > ... and I'm using the same mount command that worked previously:
>> >
>> > mount -t lustre 36.101.29.1 at tcp:/lfs /lfs
>> >
>> > From the OST I can ping the client:
>> >
>> > # lctl list_nids
>> > 36.102.29.4 at o2ib
>> > 36.101.29.4 at tcp
>> > # lctl ping 36.101.255.10 at tcp
>> > 12345-0 at lo
>> > 12345-36.101.255.10 at tcp
>> >
>> > From the client, I can ping the OST and MDS/MGS:
>> >
>> > # lctl list_nids
>> > 36.101.255.10 at tcp
>> > # lctl ping 36.101.29.4 at tcp
>> > 12345-0 at lo
>> > 12345-36.102.29.4 at o2ib
>> > 12345-36.101.29.4 at tcp
>> > # lctl ping 36.101.29.1 at tcp
>> > 12345-0 at lo
>> > 12345-36.102.29.1 at o2ib
>> > 12345-36.101.29.1 at tcp
>> >
>> > So, somehow, not having the right modprobe.conf the first time I
>> > mounted the partitions on the new OST has made it permanently not want
>> > to mount properly on Ethernet clients (it mounts fine on IB clients).
>> >
>> > Any ideas?
>> >
>> > Thanks,
>> >
>> > Chris
>> >
>>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
More information about the lustre-discuss
mailing list