[Lustre-discuss] About MDS failover

Andreas Dilger adilger at sun.com
Sat Jan 17 16:46:45 PST 2009

On Jan 15, 2009  17:51 -0800, Jeffrey Alan Bennett wrote:
> > You can use HBA multi-pathing to avoid this problem, if your 
> > hardware supports it.  You can also use 
> > /proc/fs/lustre/health_check to check if the filesystems have 
> > encountered errors and are marked "unhealthy".
> We use multipath in all our configurations. However, will Lustre
> be able to detect if the connectivity to the storage has been
> totally lost ( ie. no available path ) and display accordingly on
> /proc/fs/lustre/health_check?

Yes, but it can currently only do this "reactively" instead of
"proactively".  If you are using MMP then it should detect the
IO error and mark the filesystem read-only within a second or
so (depending on how quickly the SCSI layer returns the error vs.
retrying), which will in turn cause health_check to return "unhealthy".

However, if there is other filesystem IO going on that will also
generate an IO error that will be returned to the client.

Cheers, Andreas
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

More information about the lustre-discuss mailing list