[Lustre-discuss] Another server question.
Andreas Dilger
adilger at sun.com
Wed Feb 4 01:33:18 PST 2009
On Feb 03, 2009 12:21 -0500, Charles Taylor wrote:
> In our experience, despite what has been said and what we have read,
> if we lose or take down a single OSS, our clients lose access (i/o
> seems blocked) to the file system until that OSS is back up and has
> completed recovery. That's just or experience and it has been very
> consistent. We've never seen otherwise, though we would like to. :)
To be clear - a client process will wait indefinitely until an OST
is back alive, unless either the process is killed (this should be
possible after the Lustre recovery timeout is exceeded, 100s by
default), or the OST is explicitly marked "inactive" on the clients:
lctl --device {failed OSC device on client} deactivate
After the OSC is marked inactive, then all IO to that OST should
immediately return with -EIO, and not hang.
If you have experiences other than this it is a bug. If this isn't
explained in the documentation it is a documentation bug.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
More information about the lustre-discuss
mailing list