[Lustre-discuss] lnet infiniband config
Adam
adam at sharcnet.ca
Wed Jun 23 11:55:58 PDT 2010
Hi Thomas,
Here's a one thing to check, (if you're trying to replace a tcp network
with an IB one, on an existing lustre filesystem):
With the lustre mounts unmounted, run:
tunefs.lustre --dryrun <DEV_PATH> | grep Parameters
check to ensure that parameters like 'mgsnode=IP' end in @o2ib and not
@tcp. If they do, erase and rewrite them.
Cheers,
Adam
Erik Froese wrote:
> Thomas,
>
> If you see a ib0 device and it has a valid IP lnet should pick it up with
> options lnet networks="o2ib0(ib0)"
>
> What errors are you seeing?
>
> Erik
>
> On Tue, Jun 22, 2010 at 1:14 PM, Thomas Roth <t.roth at gsi.de> wrote:
>
>> Hello Erik,
>>
>> thanks for your advice, esp. on routing - I'll study that carefully once
>> I get that far.
>> For now, I was just trying the minimal first steps to get lnet via IB:
>> - It's all happening on the MGS/MDS, but neither mgs nor mdt yet
>> mounted, just 'modprobe lnet; lctl network up; lctl list_nids'
>> - I tried to use IB exclusively.
>> - options lnet networks="o2ib0(ib0)" doesn't work either (nor
>> variations thereof)
>>
>> Regards,
>> Thomas
>>
>> On 22.06.2010 18:40, Erik Froese wrote:
>>
>>> Hey Thomas,
>>>
>>> Are you trying to connect to Lustre via IB and ethernet? If so your
>>> modprobe config should look like this.
>>> options lnet networks="o2ib0(ib0),tcp0(eth0)"
>>>
>>> If you're IB only use.
>>> options lnet networks="o2ib0(ib0)"
>>>
>>> If your MDS and OSS servers are on a separate networks you'll need to
>>> do something different.
>>> Let's say the MDS and OSSs are on o2ib0/tcp0 and the clients are on
>>> o2ib1/tcp1. You'll need a router server with separate addresses on
>>> o2ib0 and o2ib1.
>>>
>>> Also its important to note that o2ib0 and o2ib1 should be different IP
>>> address spaces.
>>>
>>> On the clients.
>>> # I live on o2ib1
>>> options lnet networks="o2ib1(ib0),tcp1(eth0)"
>>> # To get to o2ib0 go through IP.ADD.OF.ROUTER at oi2ib1
>>> options lnet routes="o2ib0 IP.ADD.OF.ROUTER at o2ib1"
>>>
>>> On the servers
>>> # I live on o2ib0
>>> options lnet networks="o2ib0(ib0),tcp0(eth0)"
>>> # To get to o2ib1 go through IP.ADD.OF.ROUTER at oi2ib0
>>> options lnet routes="o2ib1 IP.ADD.OF.ROUTER at o2ib0"
>>>
>>> IP.ADD.OF.ROUTER at oi2ib0 and IP.ADD.OF.ROUTER at oi2ib1 are different IPs
>>> on distinct networks.
>>>
>>> lctl list_nids will show you the lustre nids of the node you're logged
>>> into only.
>>> lctl route_list will show you the lustre routers and the networks that
>>> they bridge.
>>>
>>> I hope this was helpful.
>>>
>>> Erik
>>>
>>> On Tue, Jun 22, 2010 at 10:19 AM, Thomas Roth <t.roth at gsi.de> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm getting my feet wet in the infiniband lake and of course I run into
>>>> some problems.
>>>> It would seem I got the compilation part of sles11 kernel 2.6.27 +
>>>> Lustre 1.8.3 + ofed 1.4.2 right, because it allows me to see and use the
>>>> infiniband fabric, and because ko2iblnd loads without any complaints.
>>>>
>>>> In /etc/modprobe.d/lustre (this is a Debian system, hence this subdir of
>>>> modprobe-configs), I have
>>>>
>>>>> options ip2nets="o2ib0 192.168.0.[1-5]"
>>>>>
>>>> I load lnet and do 'lctl network up', but then 'lctl list_nids' will
>>>> invariably give me only
>>>>
>>>>> 192.168.0.1 at tcp
>>>>>
>>>> no matter how I twist the modprobe-config (ip2nets="o2ib",
>>>> network="o2ib", network="o2ib(ib0), etc.)
>>>>
>>>> This is true as long as I have ib0 configured with the IP 192.168.0.1
>>>> Once I unconfigure it, I get, quite expectedly,
>>>> LNET configure error 100: Network is down
>>>>
>>>> So I can either configure ipoib and bring up the network, but using tcp,
>>>> or I don't configure ib0 and then cannot start the network -? ;-{} I
>>>> think I'm rather missing something here.
>>>> Any clues?
>>>>
>>>> Cheers,
>>>> Thomas
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>
>>>>
>> --
>> --------------------------------------------------------------------
>> Thomas Roth
>> Department: Informationstechnologie
>> Location: SB3 1.262
>> Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
>>
>> GSI Helmholtzzentrum für Schwerionenforschung GmbH
>> Planckstraße 1
>> 64291 Darmstadt
>> www.gsi.de
>>
>> Gesellschaft mit beschränkter Haftung
>> Sitz der Gesellschaft: Darmstadt
>> Handelsregister: Amtsgericht Darmstadt, HRB 1528
>>
>> Geschäftsführung: Professor Dr. Dr. h.c. Horst Stöcker,
>> Christiane Neumann, Dr. Hartmut Eickhoff
>>
>> Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph
>> Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
>>
>>
>>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
--
Adam Munro
System Administrator | SHARCNET | http://www.sharcnet.ca
Compute Canada | http://www.computecanada.org
519-888-4567 x36453
More information about the lustre-discuss
mailing list