[Lustre-discuss] Multihomed question: want Lustre over IB andEthernet

Charles Taylor taylor at hpc.ufl.edu
Fri Mar 7 09:39:59 PST 2008


Make sure the client can lctl ping the MDS and OSS o2ib nids.    Then  
make sure of the same between the OSSs and the MDS/MGS.   If all that  
seems fine, I would start to wonder if I made a mistake in specifying  
the nids when formating the OSTs.

ct


On Mar 7, 2008, at 12:17 PM, Canon, Richard Shane wrote:

>
> Chris,
>
> Perhaps you need to perform some write_conf like command. I'm not  
> sure if this is needed in 1.6 or not.
>
> Shane
>
>
>
> ----- Original Message -----
> From: lustre-discuss-bounces at lists.lustre.org <lustre-discuss- 
> bounces at lists.lustre.org>
> To: lustre-discuss <lustre-discuss at lists.lustre.org>
> Sent: Fri Mar 07 12:03:17 2008
> Subject: Re: [Lustre-discuss] Multihomed question: want Lustre over  
> IB andEthernet
>
> On Fri, Mar 7, 2008 at 9:39 AM, Craig Prescott  
> <prescott at hpc.ufl.edu> wrote:
>>
>>  I think your client modprobe.conf lnet option
>>  should be this:
>>
>>
>>  options lnet networks=o2ib(ib0)
>>
>>  (not 'o2ib0').
>
> It still seems to want the TCP connection:
>
> Lustre: Added LNI 36.122.255.1 at o2ib [8/64]
> Lustre: Lustre Client File System; info at clusterfs.com
> LustreError: 11043:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
> for 36.121.255.201 at tcp
> LustreError: 11043:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
> find peer 36.121.255.201 at tcp!
> LustreError: 11043:0:(ldlm_lib.c:312:client_obd_setup()) can't add
> initial connection
> LustreError: 11043:0:(obd_config.c:325:class_setup()) setup
> ddnlfs-MDT0000-mdc-0000010430934400 failed (-2)
> LustreError: 11043:0:(obd_config.c:1062:class_config_llog_handler())
> Err -2 on cfg command:
> LustreError: 11141:0:(connection.c:142:ptlrpc_put_connection())  
> NULL connection
> Lustre:    cmd=cf003 0:ddnlfs-MDT0000-mdc  1:ddnlfs-MDT0000_UUID
> 2:36.121.255.201 at tcp
> LustreError: 15c-8: MGC36.122.255.201 at o2ib: The configuration from log
> 'ddnlfs-client' failed (-2). This may be the result of communication
> errors between this node and the MGS, a bad configuration, or other
> errors. See the syslog for more information.
> LustreError: 11043:0:(llite_lib.c:1021:ll_fill_super()) Unable to
> process log: -2
> LustreError: 11043:0:(obd_config.c:392:class_cleanup()) Device 2  
> not setup
> Lustre: client 0000010430934400 umount complete
> LustreError: 11043:0:(obd_mount.c:1924:lustre_fill_super()) Unable to
> mount  (-2)
>
>>
>>  Another thing to try, if that doesn't work lctl
>>  ping your MDS/MGS/OSS nids, like so:
>>
>>  lctl ping 36.122.255.201 at o2ib
>
> Before and after the change it looks the same:
>
> # lctl ping 36.122.255.201 at o2ib
> 12345-0 at lo
> 12345-36.122.255.201 at o2ib
> 12345-36.121.255.201 at tcp
>
> If I change my modprobe.conf to look as on the MDS/OSS's:
>
> options lnet networks=o2ib0(ib0),tcp0(eth0)
>
> Then, mount just specifying o2ib:
>
> # mount -t lustre 36.122.255.201 at o2ib:/ddnlfs /lfs
>
> It works, but, both ko2iblnd and ksocklnd are loaded.
>
> The dmesg output is:
>
> Lustre: OBD class driver, info at clusterfs.com
>         Lustre Version: 1.6.4.2
>         Build Version:
> 1.6.4.2-19691231190000-PRISTINE-.usr.src.linux-2.6.9-67.0.4.EL- 
> Lustre-1.6.4.2
> Lustre: Added LNI 36.122.255.1 at o2ib [8/64]
> Lustre: Added LNI 36.121.255.1 at tcp [8/256]
> Lustre: Accept secure, port 988
> Lustre: Lustre Client File System; info at clusterfs.com
> Lustre: ddnlfs-clilov-000001042f8b7c00.lov: set parameter  
> stripesize=2M
> Lustre: Client ddnlfs-client has started
>
> Can I be certain it'll use IB for LFS on this client?
>
> Thanks,
>
> Chris
>>
>>  Cheers,
>>  Craig
>>
>>
>>
>>
>>  Chris Worley wrote:
>>> More issues.  Now, on the clients.
>>>
>>> The MDT/MGS/OST's are all up and mounted, showing:
>>>
>>> # lctl list_nids
>>> 36.122.255.201 at o2ib
>>> 36.121.255.201 at tcp
>>>
>>> Now, when I go to mount on the IB-based clients, I get:
>>>
>>> # mount -t lustre 36.122.255.201 at o2ib:/ddnlfs /lfs
>>> mount.lustre: mount 36.122.255.201 at o2ib:/ddnlfs at /lfs failed: No
>>> such file or directory
>>> Is the MGS specification correct?
>>> Is the filesystem name correct?
>>> If upgrading, is the copied client log valid? (see upgrade docs)
>>>
>>> The modprobe.conf contains:
>>>
>>> options lnet networks=o2ib0(ib0)
>>>
>>> And lctl looks good:
>>>
>>> # lctl list_nids
>>> 36.122.255.1 at o2ib
>>>
>>> But dmesg shows that it wants to go over the 36.121.x.x (tcp)  
>>> network
>>> (36.12[12].255.201 is the MGS/MDS server):
>>>
>>> LustreError: 10001:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID  
>>> found
>>> for 36.121.255.201 at tcp
>>> LustreError: 10001:0:(client.c:58:ptlrpc_uuid_to_connection())  
>>> cannot
>>> find peer 36.121.255.201 at tcp!
>>> LustreError: 10001:0:(ldlm_lib.c:312:client_obd_setup()) can't add
>>> initial connection
>>> LustreError: 9836:0:(connection.c:142:ptlrpc_put_connection())  
>>> NULL connection
>>> LustreError: 10001:0:(obd_config.c:325:class_setup()) setup
>>> ddnlfs-MDT0000-mdc-0000010430913c00 failed (-2)
>>> LustreError: 10001:0:(obd_config.c:1062:class_config_llog_handler())
>>> Err -2 on cfg command:
>>> Lustre:    cmd=cf003 0:ddnlfs-MDT0000-mdc  1:ddnlfs-MDT0000_UUID
>>> 2:36.121.255.201 at tcp
>>> LustreError: 15c-8: MGC36.122.255.201 at o2ib: The configuration  
>>> from log
>>> 'ddnlfs-client' failed (-2). This may be the result of communication
>>> errors between this node and the MGS, a bad configuration, or other
>>> errors. See the syslog for more information.
>>> LustreError: 10001:0:(llite_lib.c:1021:ll_fill_super()) Unable to
>>> process log: -2
>>> LustreError: 10001:0:(obd_config.c:392:class_cleanup()) Device 2  
>>> not setup
>>> Lustre: client 0000010430913c00 umount complete
>>> LustreError: 10001:0:(obd_mount.c:1924:lustre_fill_super())  
>>> Unable to
>>> mount  (-2)
>>>
>>> Note that this setup works fine in the non-multihomed setup, so I
>>> don't think ko2iblnd is to blame (the setup on the clients hasn't
>>> changed at all).
>>>
>>> What am I doing wrong?
>>>
>>> Thanks,
>>>
>>> Chris
>>> On Fri, Mar 7, 2008 at 7:41 AM, Chris Worley <worleys at gmail.com>  
>>> wrote:
>>>> I changed my modprobe.conf to look exactly as yours, and it  
>>>> worked.  I
>>>>   hadn't been using all the quotes until the doc said to... but  
>>>> they may
>>>>   have indeed been the problem.
>>>>
>>>>   Thanks!
>>>>
>>>>   Chris
>>>>
>>>>  On Fri, Mar 7, 2008 at 3:40 AM, Charles Taylor  
>>>> <taylor at hpc.ufl.edu> wrote:
>>>>>
>>>>>
>>>>>  Do "lclt list_nids" on your mds and oss's.   They should look
>>>>>  something like this.
>>>>>
>>>>>  [root at hpcmds ~]# lctl list_nids
>>>>>  10.13.24.40 at o2ib
>>>>>  10.13.16.40 at tcp
>>>>>
>>>>>  Then your clients should have a nid on one or the other.
>>>>>
>>>>>  Check your dmesg output after loading lnet.   The complaints are
>>>>>  pretty useful.  Your modprobe.conf line looks correct although we
>>>>>  found we did not need all the quoting so you should check that as
>>>>>  well.   Ours looks like...
>>>>>
>>>>>  options lnet networks=o2ib(ib0),tcp(eth0)
>>>>>
>>>>>  My guess is that it either cannot find or does not like your  
>>>>> ko2iblnd
>>>>>  module.
>>>>>
>>>>>  ct
>>>>>
>>>>>
>>>>>
>>>>>  On Mar 7, 2008, at 12:46 AM, Chris Worley wrote:
>>>>>
>>>>>> Most everything is over IB, but I have a few systems I'd like  
>>>>>> to mount
>>>>>> the Lustre fs over GigE.
>>>>>>
>>>>>> I think I've followed the Multihomed instructions correctly, in:
>>>>>>
>>>>>> http://dlc.sun.com/pdf/820-3681/820-3681.pdf
>>>>>>
>>>>>> My /etc/modprobe.conf on mds/mgs/oss servers (which all have both
>>>>>> Ethernet and IB) includes:
>>>>>>
>>>>>> options lnet 'networks="tcp0(eth0),o2ib0(ib0)"'
>>>>>>
>>>>>> I make and mount the mdt with (which has both IB and Ethernet,  
>>>>>> subnet
>>>>>> 36.122.x.x is IB, 36.121.x.x is Ethernet):
>>>>>>
>>>>>> # mkfs.lustre --mdt --mgs
>>>>>> --mgsnode="36.122.255.201 at o2ib0,36.121.255.201 at tcp0" <... > / 
>>>>>> dev/md0
>>>>>> # mount -t lustre /dev/md0  /lfs/mdtb
>>>>>>
>>>>>> But, at this point, the ksocklnd module is loaded rather than the
>>>>>> ko2iblnd module!
>>>>>>
>>>>>> On the OSS, I make the fs w/ the same  "msgnode", but, when I  
>>>>>> try to
>>>>>> mount it, it correctly uses the IB interface, but can't  
>>>>>> contact the
>>>>>> MDS:
>>>>>>
>>>>>> LustreError: 27520:0:(events.c:401:ptlrpc_uuid_to_peer()) No  
>>>>>> NID found
>>>>>> for MGC36.122.255.201 at o2ib_0
>>>>>> LustreError: 27520:0:(client.c:58:ptlrpc_uuid_to_connection())  
>>>>>> cannot
>>>>>> find peer MGC36.122.255.201 at o2ib_0!
>>>>>> LustreError: 27520:0:(ldlm_lib.c:312:client_obd_setup()) can't  
>>>>>> add
>>>>>> initial connection
>>>>>> LustreError: 17126:0:(connection.c:142:ptlrpc_put_connection())
>>>>>> NULL connection
>>>>>> LustreError: 27520:0:(obd_config.c:325:class_setup()) setup
>>>>>> MGC36.122.255.201 at o2ib failed (-2)
>>>>>> LustreError: 27520:0:(obd_mount.c:454:lustre_start_simple())
>>>>>> MGC36.122.255.201 at o2ib setup error -2
>>>>>> LustreError: 27520:0:(obd_mount.c:1368:server_put_super()) no obd
>>>>>> ddnlfs-OSTffff
>>>>>> LustreError: 27520:0:(obd_mount.c:119:server_deregister_mount())
>>>>>> ddnlfs-OSTffff not registered
>>>>>>
>>>>>> It too has loaded the ksocklnd module, and not the ko2iblnd  
>>>>>> module.  I
>>>>>> guess that both modules should be loaded in a multihomed case?
>>>>>>
>>>>>> What am I doing wrong?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Chris
>>>>>> _______________________________________________
>>>>>> Lustre-discuss mailing list
>>>>>> Lustre-discuss at lists.lustre.org
>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>>
>>>>>
>>>>
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss




More information about the lustre-discuss mailing list