[Lustre-discuss] Cannot send after transport endpoint shutdown (-108)

Aaron Knister aaron at iges.org
Wed Mar 5 08:08:33 PST 2008


That's very strange. What interconnect is that site using?

My versions are -

Lustre  - 1.6.4.2
Kernel (servers) - 2.6.18-8.1.14.el5_lustre.1.6.4.2smp
Kernel (clients) - 2.6.18-53.1.13.el5



On Mar 5, 2008, at 11:03 AM, Frank Leers wrote:

> On Tue, 2008-03-04 at 22:04 +0100, Brian J. Murrell wrote:
>> On Tue, 2008-03-04 at 15:55 -0500, Aaron S. Knister wrote:
>>> I think I tried that before and it didn't help, but I will try it
>>> again. Thanks for the suggestion.
>>
>> Just so you guys know, 1000 seconds for the obd_timeout is very, very
>> large!  As you could probably guess, we have some very, very big  
>> Lustre
>> installations and to the best of my knowledge none of them are using
>> anywhere near that.  AFAIK (and perhaps a Sun engineer with closer
>> experience to some of these very large clusters might correct me) the
>> largest value that the largest clusters are using is in the
>> neighbourhood of 300s.  There has to be some other problem at play  
>> here
>> that you need 1000s.
>
> I can confirm that at a recent large installation with several  
> thousand
> clients, the default of 100 is in effect.
>
>>
>> Can you both please report your lustre and kernel versions?  I know  
>> you
>> said "latest" Aaron, but some version numbers might be more solid  
>> to go
>> on.
>>
>> b.
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Aaron Knister
Associate Systems Analyst
Center for Ocean-Land-Atmosphere Studies

(301) 595-7000
aaron at iges.org







More information about the lustre-discuss mailing list