[Lustre-discuss] Multihomed question: want Lustre over IB and Ethernet
Craig Prescott
prescott at hpc.ufl.edu
Fri Mar 7 08:39:57 PST 2008
I think your client modprobe.conf lnet option
should be this:
options lnet networks=o2ib(ib0)
(not 'o2ib0').
Another thing to try, if that doesn't work lctl
ping your MDS/MGS/OSS nids, like so:
lctl ping 36.122.255.201 at o2ib
Cheers,
Craig
Chris Worley wrote:
> More issues. Now, on the clients.
>
> The MDT/MGS/OST's are all up and mounted, showing:
>
> # lctl list_nids
> 36.122.255.201 at o2ib
> 36.121.255.201 at tcp
>
> Now, when I go to mount on the IB-based clients, I get:
>
> # mount -t lustre 36.122.255.201 at o2ib:/ddnlfs /lfs
> mount.lustre: mount 36.122.255.201 at o2ib:/ddnlfs at /lfs failed: No
> such file or directory
> Is the MGS specification correct?
> Is the filesystem name correct?
> If upgrading, is the copied client log valid? (see upgrade docs)
>
> The modprobe.conf contains:
>
> options lnet networks=o2ib0(ib0)
>
> And lctl looks good:
>
> # lctl list_nids
> 36.122.255.1 at o2ib
>
> But dmesg shows that it wants to go over the 36.121.x.x (tcp) network
> (36.12[12].255.201 is the MGS/MDS server):
>
> LustreError: 10001:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
> for 36.121.255.201 at tcp
> LustreError: 10001:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
> find peer 36.121.255.201 at tcp!
> LustreError: 10001:0:(ldlm_lib.c:312:client_obd_setup()) can't add
> initial connection
> LustreError: 9836:0:(connection.c:142:ptlrpc_put_connection()) NULL connection
> LustreError: 10001:0:(obd_config.c:325:class_setup()) setup
> ddnlfs-MDT0000-mdc-0000010430913c00 failed (-2)
> LustreError: 10001:0:(obd_config.c:1062:class_config_llog_handler())
> Err -2 on cfg command:
> Lustre: cmd=cf003 0:ddnlfs-MDT0000-mdc 1:ddnlfs-MDT0000_UUID
> 2:36.121.255.201 at tcp
> LustreError: 15c-8: MGC36.122.255.201 at o2ib: The configuration from log
> 'ddnlfs-client' failed (-2). This may be the result of communication
> errors between this node and the MGS, a bad configuration, or other
> errors. See the syslog for more information.
> LustreError: 10001:0:(llite_lib.c:1021:ll_fill_super()) Unable to
> process log: -2
> LustreError: 10001:0:(obd_config.c:392:class_cleanup()) Device 2 not setup
> Lustre: client 0000010430913c00 umount complete
> LustreError: 10001:0:(obd_mount.c:1924:lustre_fill_super()) Unable to
> mount (-2)
>
> Note that this setup works fine in the non-multihomed setup, so I
> don't think ko2iblnd is to blame (the setup on the clients hasn't
> changed at all).
>
> What am I doing wrong?
>
> Thanks,
>
> Chris
> On Fri, Mar 7, 2008 at 7:41 AM, Chris Worley <worleys at gmail.com> wrote:
>> I changed my modprobe.conf to look exactly as yours, and it worked. I
>> hadn't been using all the quotes until the doc said to... but they may
>> have indeed been the problem.
>>
>> Thanks!
>>
>> Chris
>>
>> On Fri, Mar 7, 2008 at 3:40 AM, Charles Taylor <taylor at hpc.ufl.edu> wrote:
>> >
>> >
>> > Do "lclt list_nids" on your mds and oss's. They should look
>> > something like this.
>> >
>> > [root at hpcmds ~]# lctl list_nids
>> > 10.13.24.40 at o2ib
>> > 10.13.16.40 at tcp
>> >
>> > Then your clients should have a nid on one or the other.
>> >
>> > Check your dmesg output after loading lnet. The complaints are
>> > pretty useful. Your modprobe.conf line looks correct although we
>> > found we did not need all the quoting so you should check that as
>> > well. Ours looks like...
>> >
>> > options lnet networks=o2ib(ib0),tcp(eth0)
>> >
>> > My guess is that it either cannot find or does not like your ko2iblnd
>> > module.
>> >
>> > ct
>> >
>> >
>> >
>> > On Mar 7, 2008, at 12:46 AM, Chris Worley wrote:
>> >
>> > > Most everything is over IB, but I have a few systems I'd like to mount
>> > > the Lustre fs over GigE.
>> > >
>> > > I think I've followed the Multihomed instructions correctly, in:
>> > >
>> > > http://dlc.sun.com/pdf/820-3681/820-3681.pdf
>> > >
>> > > My /etc/modprobe.conf on mds/mgs/oss servers (which all have both
>> > > Ethernet and IB) includes:
>> > >
>> > > options lnet 'networks="tcp0(eth0),o2ib0(ib0)"'
>> > >
>> > > I make and mount the mdt with (which has both IB and Ethernet, subnet
>> > > 36.122.x.x is IB, 36.121.x.x is Ethernet):
>> > >
>> > > # mkfs.lustre --mdt --mgs
>> > > --mgsnode="36.122.255.201 at o2ib0,36.121.255.201 at tcp0" <... > /dev/md0
>> > > # mount -t lustre /dev/md0 /lfs/mdtb
>> > >
>> > > But, at this point, the ksocklnd module is loaded rather than the
>> > > ko2iblnd module!
>> > >
>> > > On the OSS, I make the fs w/ the same "msgnode", but, when I try to
>> > > mount it, it correctly uses the IB interface, but can't contact the
>> > > MDS:
>> > >
>> > > LustreError: 27520:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
>> > > for MGC36.122.255.201 at o2ib_0
>> > > LustreError: 27520:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
>> > > find peer MGC36.122.255.201 at o2ib_0!
>> > > LustreError: 27520:0:(ldlm_lib.c:312:client_obd_setup()) can't add
>> > > initial connection
>> > > LustreError: 17126:0:(connection.c:142:ptlrpc_put_connection())
>> > > NULL connection
>> > > LustreError: 27520:0:(obd_config.c:325:class_setup()) setup
>> > > MGC36.122.255.201 at o2ib failed (-2)
>> > > LustreError: 27520:0:(obd_mount.c:454:lustre_start_simple())
>> > > MGC36.122.255.201 at o2ib setup error -2
>> > > LustreError: 27520:0:(obd_mount.c:1368:server_put_super()) no obd
>> > > ddnlfs-OSTffff
>> > > LustreError: 27520:0:(obd_mount.c:119:server_deregister_mount())
>> > > ddnlfs-OSTffff not registered
>> > >
>> > > It too has loaded the ksocklnd module, and not the ko2iblnd module. I
>> > > guess that both modules should be loaded in a multihomed case?
>> > >
>> > > What am I doing wrong?
>> > >
>> > > Thanks,
>> > >
>> > > Chris
>> > > _______________________________________________
>> > > Lustre-discuss mailing list
>> > > Lustre-discuss at lists.lustre.org
>> > > http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> >
>> >
>>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
More information about the lustre-discuss
mailing list