[Lustre-discuss] OST redundancy between nodes?

Fri Jun 26 11:57:30 PDT 2009

Just to clarify one more point: failover is designed to handle  
temporary issues with the server. It is NOT to handle problems with  
either the storage or the network: Lustre assumes neither will have  
problems.

More below.

On Jun 26, 2009, at 12:09 PM, "Brian J. Murrell"  
<Brian.Murrell at Sun.COM> wrote:

> On Fri, 2009-06-26 at 11:51 -0600, Kevin Van Maren wrote:
>> If an OST "fails", meaning that the underlying HW has failed (or the
>> connection to the storage has failed -- one reason to use multipath  
>> IO),
>> then Lustre will return IO errors to the application (although  
>> there is
>> an RFE to not do that).
>
> This is not entirely true.  It is only true when an OST is  
> configured as
> "failout".  When an OST is configured as failover however (which is  
> the
> typical case), the application just blocks until the OST can be put  
> back
> into service again on any of the defined failover nodes for that OST  
> and
> the client can reconnect.  At that time, pending operations are  
> resumed
> and the application continues.

If the client connection to the server is lost, then yes. But I was  
referring to the storage returning an IO error to the server; when  
that happens, the server returns IO errors to the client, which are  
then passed to the application.

The request to not forward those errors is in bugzilla -- basically  
give heartbeat a chance to do a failover if the path to storage is  
lost on the server.

>
>> Normally what happens is the OSS _node_ fails,
>> and the other node mounts the OST (typically done by using
>> Linux-HA/Heartbeat).
>
> Right.  And no applications see any errors while this happens.
>
> And it is worth noting that defining an OST for failover does not
> require that more than one OSS be defined for it.  You can provide
> "failover service" (i.e. no EIOs to clients) using a single OSS.  If  
> it
> dies, then clients just block until it can be repaired.

Right, that lets you reboot the server semi-transparently (still have  
the delay/hang on the filesystem).  But does not handle the server  
getting IO errors from te storage.

>
Kevin