[Lustre-discuss] About MDS failover

Andreas Dilger adilger at sun.com
Sat Jan 17 16:46:45 PST 2009


On Jan 15, 2009  17:51 -0800, Jeffrey Alan Bennett wrote:
> > You can use HBA multi-pathing to avoid this problem, if your 
> > hardware supports it.  You can also use 
> > /proc/fs/lustre/health_check to check if the filesystems have 
> > encountered errors and are marked "unhealthy".
> 
> We use multipath in all our configurations. However, will Lustre
> be able to detect if the connectivity to the storage has been
> totally lost ( ie. no available path ) and display accordingly on
> /proc/fs/lustre/health_check?

Yes, but it can currently only do this "reactively" instead of
"proactively".  If you are using MMP then it should detect the
IO error and mark the filesystem read-only within a second or
so (depending on how quickly the SCSI layer returns the error vs.
retrying), which will in turn cause health_check to return "unhealthy".

However, if there is other filesystem IO going on that will also
generate an IO error that will be returned to the client.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list