[Lustre-discuss] Another server question.

Brian J. Murrell Brian.Murrell at Sun.COM
Tue Feb 3 09:28:28 PST 2009


On Tue, 2009-02-03 at 12:21 -0500, Charles Taylor wrote:
> 
> In our experience, despite what has been said and what we have read,  
> if we lose or take down a single OSS, our clients lose access (i/o  
> seems blocked) to the file system until that OSS is back up and has  
> completed recovery.

That is likely the "real world" results of taking down an OSS, indeed.
But that is more likely simply due to the "random distribution" of
files/stripes around your filesystem and that it won't take long for all
active clients to eventually want something from that missing OSS.

> Again, not in our experience.

Have you actually tested your theory in a controlled environment where
you could be sure that clients that got hung up have never tried to
access an OST on missing OSS?  If so, and you are still finding that
clients that don't touch the downed OSS are getting hung up, please, by
all means, file a bug.

> We've been running three separate Lustre file systems for over a year  
> now and are *very* happy with it.

Glad to hear that, sincerely.

> We wish that when an  
> OSS went down, we only lost access to files/objects on *that* OSS but,  
> again, that has not been our experience.

It's certainly supposed to be.  As above, if you find otherwise, please
let us know.

> Still we've kissed a lot  
> of distributed/parallel file system frogs.   We'll take Lustre, hands  
> down.

Thanx for the vote of confidence.  It's always nice to hear about people
who are happy.

Cheers,
b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090203/d02dcfc3/attachment.pgp>


More information about the lustre-discuss mailing list