[Lustre-discuss] Cannot send after transport endpoint shutdown (-108)

Tue Mar 4 13:04:32 PST 2008

On Tue, 2008-03-04 at 15:55 -0500, Aaron S. Knister wrote:
> I think I tried that before and it didn't help, but I will try it
> again. Thanks for the suggestion.

Just so you guys know, 1000 seconds for the obd_timeout is very, very
large!  As you could probably guess, we have some very, very big Lustre
installations and to the best of my knowledge none of them are using
anywhere near that.  AFAIK (and perhaps a Sun engineer with closer
experience to some of these very large clusters might correct me) the
largest value that the largest clusters are using is in the
neighbourhood of 300s.  There has to be some other problem at play here
that you need 1000s.

Can you both please report your lustre and kernel versions?  I know you
said "latest" Aaron, but some version numbers might be more solid to go
on.

b.