[Lustre-discuss] Lustre over o2ib issue

Diego Moreno Diego.Moreno-Lazaro at bull.net
Tue Mar 22 06:26:09 PDT 2011


Hi,

We are having this problem right now with our Lustre 2.0. We tried the 
proposed solutions but we didn't get it.

We have 2 QDR IB cards on 4 servers and we have to do "lctl ping" from 
each server to every client if we want clients to connect to servers. We 
don't have ib_mthca modules loaded because we don't have DDR cards and 
we configured ip2nets with no result.

Our ip2nets configuration ([7-10] interfaces are in servers, the others 
are in clients):
o2ib0(ib0) 10.50.0.[7-10] ; o2ib1(ib1) 10.50.1.[7-10] ; o2ib0(ib0) 
10.50.*.* ; o2ib1(ib0) 10.50.*.*

So the only way of having clients connected to servers is doing 
something like this on every server:

for i in $CLIENT_IB_LIST ; do
lctl ping $i at o2ib0
lctl ping $i at o2ib1
done

Before "lctl ping" we get messages like this one:

Lustre: 50389:0:(lib-move.c:1028:lnet_post_send_locked()) Dropping 
message for 12345-10.50.1.7 at o2ib1: peer not alive

After "lctl ping' everything works right.

Maybe I'm missing something or this is a known bug in lustre 2.0...


On 16/03/2011 22:13, Andreas Dilger wrote:
> On 2011-03-16, at 3:04 PM, Mike Hanby wrote:
>> Thanks, I forgot to include the card info:
>>
>> The servers each have a single IB card: dual port MT26528 QDR
>> o2ib0(ib0) on each server is attached to the QLogic switch (with three attached M3601Q switches 48 attached blades)
>> o2ib1(ib1) on each server is attached to a stack of two M3601Q switches with 24 attached blades
>>
>> The blades connected to o2ib0 each have an MT26428 QDR IB card
>> The blades connected to o2ib1 each have an MT25418 DDR IB card
>
> You may also want to check out the ip2nets option for specifying the Lustre networks.  It is made to handle configuration issues like this where the interface name is not constant across client/server nodes.
>
>>
>> -----Original Message-----
>> From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Nirmal Seenu
>> Sent: Wednesday, March 16, 2011 2:10 PM
>> To: lustre-discuss at lists.lustre.org
>> Subject: Re: [Lustre-discuss] Lustre over o2ib issue
>>
>> If you are using DDR and QDR or any 2 different cards cards in the same machine there is no guarantee that the same IB cards get assigned to ib0 and ib.
>>
>> To fix that problem you need to comment out the following 3 lines /etc/init.d/openibd:
>>
>>      #for i in `grep "^driver: " /etc/sysconfig/hwconf | sed -e 's/driver: //' | grep -w "ib_mthca\\\|ib_ipath\\\|mlx4_core\\\|cxgb3\\\|iw_nes"`; do
>>      #    load_modules $i
>>      #done
>>
>> and include the following lines instead(we wanted the DDR card to be ib0 and the QDR card to be ib1):
>>      load_modules ib_mthca
>>      /bin/sleep 10
>>      load_modules mlx4_core
>>
>> and you will need to restart openibd once again (we included it in rc.local) to make sure that the same IB cards are assigned to the devices ib0 and ib1.
>>
>> Nirmal
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Engineer
> Whamcloud, Inc.
>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>



More information about the lustre-discuss mailing list