[Lustre-discuss] Question about failnode

Mon Oct 29 02:25:42 PDT 2007

On 10/29/07, Kazuki Ohara <ohara at rd.scei.sony.co.jp> wrote:
> Brian J. Murrell wrote:
> > On Fri, 2007-10-26 at 16:03 +0900, Kazuki Ohara wrote:
> >> Hi Brain,
> >> Thank you for your answer.
> >
> > NP.
> >
> >> By the way, I doubt the need of the --failnode directive.
> >
> > It's needed.  That is how the MGS and thusly all other nodes learn of an
> > OSS's failover partner.  This information is communicated via the
> > mkfs.lustre command to the MGS.
>
> uh...
> Thank you for you answer, but,
> I can't find out the reason why MGS and OSS need to learn of the failover partner.
> By that information, does MGS or OSS request the partner not to access the shared volume
> or something special requests?

I am sure someone who understands Lustre internals can tackle this
question better, however, from my understanding:

The MGS keeps track of all data as it's written to the OST in
question, as well as the OSS responsible for the OST. By creating a
pair of OSS systems, one is effectively delegating responsibility of
the back-end storage to a pair of OSS machines and ensuring a seamless
fail-over by redirecting client requests transparently.

The second part of your question is "how do you tell the stand-by OSS
not to access the volume?".  The Lustre MGS is not going to direct
client requests to the stand-by node for the OST in question when a
client request comes in, hence, the shared or replicated OST on the
fail-over pair need not be mounted or actively available.

In case a fail-over is required, the shared/replicated storage device
is mounted on the stand-by OSS and Lustre fail-over requested via the
MGS.

> Excuse my persistent question.

Hope the explanation helped -- I am sure CFS/Sun can clarify further.

-mustafa.