[Lustre-discuss] OFED 1.5.1 on Clients
Andreas Dilger
andreas.dilger at oracle.com
Fri Jun 18 12:45:01 PDT 2010
Since the event is unknown it is hard to know in advance whether it
can be ignored or not. Some protocols encode in the message type
whether it is 'mandatory' to handle or 'optional', or as Lustre does
it negotiates in advance what operations are understood and never
sends unknown requests to peers. I have no idea whether IB does this
or not.
In the absence of such information, the safest behaviour (if not the
most robust) is to fail since the unknown event may be critical to the
correct behaviour of the system.
Cheers, Andreas
On 2010-06-18, at 12:48, Roger Spellman <Roger.Spellman at terascala.com>
wrote:
> Jason (or anyone else),
>
> Patch 23498 ( https://bugzilla.lustre.org/attachment.cgi?id=23498 )
> says:
>
> Index: ./lnet/klnds/o2iblnd/o2iblnd_cb.c
> ===================================================================
> RCS file: /cvsroot/cfs/lnet/klnds/o2iblnd/o2iblnd_cb.c,v
> retrieving revision 1.12.6.1.2.5
> diff -u -p -u -p -r1.12.6.1.2.5 o2iblnd_cb.c
> --- ./lnet/klnds/o2iblnd/o2iblnd_cb.c 20 Nov 2008 09:29:34 -0000
> 1.12.6.1.2.5
> +++ ./lnet/klnds/o2iblnd/o2iblnd_cb.c 15 May 2009 12:26:07 -0000
> @@ -2654,6 +2654,8 @@ kiblnd_cm_callback(struct rdma_cm_id *cm
>
> switch (event->event) {
> default:
> + CERROR("Unexpected event: %d, status: %d\n",
> + event->event, event->status);
> LBUG();
>
> Why should we LBUG just for an unexpected event? Couldn't it just be
> ignored?
>
> -Roger
>
>> -----Original Message-----
>> From: Jason Rappleye [mailto:jason.rappleye at nasa.gov]
>> Sent: Friday, June 18, 2010 2:16 PM
>> To: Roger Spellman
>> Cc: lustre-discuss at lists.lustre.org
>> Subject: Re: [Lustre-discuss] OFED 1.5.1 on Clients
>>
>>
>> On Jun 18, 2010, at 7:49 AM, Roger Spellman wrote:
>>
>>> Jason,
>>> Thanks for this response. This brings up another question:
>>
>> np
>>
>>> The bug number you referred to mentions an LBUG in OFED 1.4.1. Are
>>> you
>>> saying that the same LBUG would occur with OFED 1.5.1 too without
> the
>>> patch?
>>
>> Yes. The patch handles new RDMA CM events that appear in OFED 1.4(.
>> 1?). They are also in 1.5.1. Without the patch, receipt of one of
>> those events will result in an LBUG.
>>
>> Jason
>>
>>>
>>> -Roger
>>>
>>>> -----Original Message-----
>>>> From: Jason Rappleye [mailto:jason.rappleye at nasa.gov]
>>>> Sent: Thursday, June 17, 2010 5:02 PM
>>>> To: Roger Spellman
>>>> Cc: lustre-discuss at lists.lustre.org
>>>> Subject: Re: [Lustre-discuss] OFED 1.5.1 on Clients
>>>>
>>>>
>>>> On Jun 17, 2010, at 1:23 PM, Roger Spellman wrote:
>>>>
>>>>> Hi,
>>>>> Can anyone share their experiences using OFED 1.5.1 on Lustre
>>>>> Clients? This is needed because RHAT 5.5 does not support OFED
>>> 1.4.2.
>>>>
>>>> We're using it with ~9300 clients running Lustre 1.6.6 and haven't
>>>> identified any OFED 1.5.1-specific issues. If you're using 1.6.x
> and
>>>> haven't done so already, you'll want to apply bug 19520 attach
> 23498.
>>>>
>>>> We just deployed 1.8.2 on a separate cluster with ~130 clients and
>>>> haven't seen any OFED-specific issues there, either. While we did
> see
>>>> some failures when running acc-sm with the stack of software we use
>>>> here, none of those had anything to do with the version of OFED we
>>>> were running.
>>>>
>>>
>>
>> --
>> Jason Rappleye
>> System Administrator
>> NASA Advanced Supercomputing Division
>> NASA Ames Research Center
>> Moffett Field, CA 94035
>>
>>
>>
>>
>>
>>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
More information about the lustre-discuss
mailing list