[Lustre-discuss] lnet infiniband config

Adam adam at sharcnet.ca
Wed Jun 23 11:55:58 PDT 2010


Hi Thomas,

Here's a one thing to check, (if you're trying to replace a tcp network 
with an IB one, on an existing lustre filesystem):

With the lustre mounts unmounted, run:
  tunefs.lustre --dryrun <DEV_PATH> | grep Parameters

check to ensure that parameters like 'mgsnode=IP' end in @o2ib and not 
@tcp. If they do, erase and rewrite them.

Cheers,
Adam

Erik Froese wrote:
> Thomas,
>
> If you see a ib0 device and it has a valid IP lnet should pick it up with
> options lnet networks="o2ib0(ib0)"
>
> What errors are you seeing?
>
> Erik
>
> On Tue, Jun 22, 2010 at 1:14 PM, Thomas Roth <t.roth at gsi.de> wrote:
>   
>> Hello Erik,
>>
>> thanks for your advice, esp. on routing - I'll study that carefully once
>> I get that far.
>> For now, I was just trying the minimal first steps to get lnet via IB:
>> - It's all happening on the MGS/MDS, but neither mgs nor mdt yet
>> mounted, just 'modprobe lnet; lctl network up; lctl list_nids'
>> - I tried to use IB exclusively.
>> - options lnet networks="o2ib0(ib0)"  doesn't work either (nor
>> variations thereof)
>>
>> Regards,
>> Thomas
>>
>> On 22.06.2010 18:40, Erik Froese wrote:
>>     
>>> Hey Thomas,
>>>
>>> Are you trying to connect to Lustre via IB and ethernet? If so your
>>> modprobe config should look like this.
>>> options lnet networks="o2ib0(ib0),tcp0(eth0)"
>>>
>>> If you're IB only use.
>>> options lnet networks="o2ib0(ib0)"
>>>
>>> If your MDS and OSS servers are on a separate networks you'll need to
>>> do something different.
>>> Let's say the MDS and OSSs are on o2ib0/tcp0 and the clients are on
>>> o2ib1/tcp1. You'll need a router server with separate addresses on
>>> o2ib0 and o2ib1.
>>>
>>> Also its important to note that o2ib0 and o2ib1 should be different IP
>>> address spaces.
>>>
>>> On the clients.
>>> # I live on o2ib1
>>> options lnet networks="o2ib1(ib0),tcp1(eth0)"
>>> # To get to o2ib0 go through IP.ADD.OF.ROUTER at oi2ib1
>>> options lnet routes="o2ib0 IP.ADD.OF.ROUTER at o2ib1"
>>>
>>> On the servers
>>> # I live on o2ib0
>>> options lnet networks="o2ib0(ib0),tcp0(eth0)"
>>> # To get to o2ib1 go through IP.ADD.OF.ROUTER at oi2ib0
>>> options lnet routes="o2ib1 IP.ADD.OF.ROUTER at o2ib0"
>>>
>>> IP.ADD.OF.ROUTER at oi2ib0 and IP.ADD.OF.ROUTER at oi2ib1 are different IPs
>>> on distinct networks.
>>>
>>> lctl list_nids will show you the lustre nids of the node you're logged
>>> into only.
>>> lctl route_list will show you the lustre routers and the networks that
>>> they bridge.
>>>
>>> I hope this was helpful.
>>>
>>> Erik
>>>
>>> On Tue, Jun 22, 2010 at 10:19 AM, Thomas Roth <t.roth at gsi.de> wrote:
>>>       
>>>> Hi all,
>>>>
>>>> I'm getting my feet wet in the infiniband lake and of course I run into
>>>> some problems.
>>>> It would seem I got the compilation part of sles11 kernel 2.6.27 +
>>>> Lustre 1.8.3 + ofed 1.4.2 right, because it allows me to see and use the
>>>> infiniband fabric, and because ko2iblnd loads without any complaints.
>>>>
>>>> In /etc/modprobe.d/lustre (this is a Debian system, hence this subdir of
>>>> modprobe-configs), I have
>>>>         
>>>>> options ip2nets="o2ib0 192.168.0.[1-5]"
>>>>>           
>>>> I load lnet and do 'lctl network up', but then 'lctl list_nids' will
>>>> invariably give me only
>>>>         
>>>>> 192.168.0.1 at tcp
>>>>>           
>>>> no matter how I twist the modprobe-config (ip2nets="o2ib",
>>>> network="o2ib", network="o2ib(ib0), etc.)
>>>>
>>>> This is true as long as I have ib0 configured with the IP 192.168.0.1
>>>> Once I unconfigure it, I get, quite expectedly,
>>>> LNET configure error 100: Network is down
>>>>
>>>> So I can either configure ipoib and bring up the network, but using tcp,
>>>> or I don't configure ib0 and then cannot start the network -? ;-{}  I
>>>> think I'm rather missing something here.
>>>> Any clues?
>>>>
>>>> Cheers,
>>>> Thomas
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>
>>>>         
>> --
>> --------------------------------------------------------------------
>> Thomas Roth
>> Department: Informationstechnologie
>> Location: SB3 1.262
>> Phone: +49-6159-71 1453  Fax: +49-6159-71 2986
>>
>> GSI Helmholtzzentrum für Schwerionenforschung GmbH
>> Planckstraße 1
>> 64291 Darmstadt
>> www.gsi.de
>>
>> Gesellschaft mit beschränkter Haftung
>> Sitz der Gesellschaft: Darmstadt
>> Handelsregister: Amtsgericht Darmstadt, HRB 1528
>>
>> Geschäftsführung: Professor Dr. Dr. h.c. Horst Stöcker,
>> Christiane Neumann, Dr. Hartmut Eickhoff
>>
>> Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph
>> Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
>>
>>
>>     
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>   


-- 
Adam Munro
System Administrator  | SHARCNET | http://www.sharcnet.ca
Compute Canada | http://www.computecanada.org
519-888-4567 x36453





More information about the lustre-discuss mailing list