[Lustre-discuss] Cannot send after transport endpoint shutdown (-108)

Wed Mar 5 08:03:14 PST 2008

On Tue, 2008-03-04 at 22:04 +0100, Brian J. Murrell wrote:
> On Tue, 2008-03-04 at 15:55 -0500, Aaron S. Knister wrote:
> > I think I tried that before and it didn't help, but I will try it
> > again. Thanks for the suggestion.
> 
> Just so you guys know, 1000 seconds for the obd_timeout is very, very
> large!  As you could probably guess, we have some very, very big Lustre
> installations and to the best of my knowledge none of them are using
> anywhere near that.  AFAIK (and perhaps a Sun engineer with closer
> experience to some of these very large clusters might correct me) the
> largest value that the largest clusters are using is in the
> neighbourhood of 300s.  There has to be some other problem at play here
> that you need 1000s.

I can confirm that at a recent large installation with several thousand
clients, the default of 100 is in effect.

> 
> Can you both please report your lustre and kernel versions?  I know you
> said "latest" Aaron, but some version numbers might be more solid to go
> on.
> 
> b.
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss