[Lustre-discuss] Another server question.

Charles Taylor taylor at hpc.ufl.edu
Tue Feb 3 11:51:10 PST 2009


On Feb 3, 2009, at 12:28 PM, Brian J. Murrell wrote:

> On Tue, 2009-02-03 at 12:21 -0500, Charles Taylor wrote:
>>
>> In our experience, despite what has been said and what we have read,
>> if we lose or take down a single OSS, our clients lose access (i/o
>> seems blocked) to the file system until that OSS is back up and has
>> completed recovery.
>
> That is likely the "real world" results of taking down an OSS, indeed.
> But that is more likely simply due to the "random distribution" of
> files/stripes around your filesystem and that it won't take long for  
> all
> active clients to eventually want something from that missing OSS.

That could certainly be the case.

>
>> Again, not in our experience.
>
> Have you actually tested your theory in a controlled environment where
> you could be sure that clients that got hung up have never tried to
> access an OST on missing OSS?

No, we've never set out to prove that it works or doesn't.   We are  
not complaining though - just saying that for us the "practical"  
ramification of an OSS going down is that the file system will be  
unusable until the OSS is back in service and recovery is complete.

>  If so, and you are still finding that
> clients that don't touch the downed OSS are getting hung up, please,  
> by
> all means, file a bug.

Will do.   We'll be upgrading to 1.6.6 pretty soon and perhaps we'll  
do some more extensive testing then.

Regards,

Charlie

>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss




More information about the lustre-discuss mailing list