[Lustre-discuss] OST redundancy between nodes?

Fri Jun 26 11:52:27 PDT 2009

On Fri, 2009-06-26 at 13:32 -0500, Carlos Santana wrote:
> 
> Yeah, this was answered by Kevin in the beginning of this thread. My
> question was what will be the message/error given to the client.

If the media corrupts, typically ldiskfs will see that and set it
read-only, returning read-only errors to the client.

If the disk outright dies and just fails to respond at all to requests
from the OSS, IIRC, the OSS will just keep trying and the client will
end up timing out and will start to find an OSS (i.e. among the failover
nodes) that will respond for that OST.

> Also, I did not understand OSS failure and OST failure terms were used
> interchangeably.

They are not, really.  OST failure is so catastrophic that most people
go to great lengths to avoid it so it's not considered as often as OSS
failure.

> The 'failover' term seems appropriate when talking abt servers and not
> targets.

No, it's quite related to targets, but not in the sense that the disk
itself dies (see above about the lengths people go to avoid this) since
Lustre can't do anything about this anyway.  Failover is configured at
the target level, not the server level.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090626/d76e6972/attachment.pgp>