[Lustre-discuss] lustre client 1.6.5.1 hangs

Brian J. Murrell Brian.Murrell at Sun.COM
Fri Jul 11 10:15:07 PDT 2008


On Fri, 2008-07-11 at 08:24 +0200, Heiko Schroeter wrote:
> 
> Is 'failout' not ok ?

That's up to you.

Failout means that if an OST becomes unreachable (because it has failed
or taken off the network, or unmounted or turned off, etc.) then any I/O
to get objects from that OST will cause a client to get an EIO
(Input/Output error).

Failover means that a client that tries to do I/O to a failed OST will
continue to try (forever) until it gets an answer.  A userspace sees
nothing strange, other than an I/O that takes, potentially, a very long
time to complete.

> Actually we like to use it because we like to use the 
> lustre system as a huge expandable data archive system.

I'm not sure what using failout has to do with that.

> If one OST breaks 
> down and destroys the data on it we can restore them.

Again, failout/failover really has nothing to do with this.  It has
everything to do with what a client does when it sees an OST fail.

> Actually i do expect the client not tho hang any job that acesses the file 
> systerm in this moment. If that needs an EIO and KILL of that process this is 
> fine by me.

Well, no kill should be necessary.  An EIO should terminate an
application.  Unless it has a retry handler for EIOs written into it.
That's not very common.  EIO usually should be interpreted as fatal.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080711/22917b90/attachment.pgp>


More information about the lustre-discuss mailing list