[Lustre-discuss] Failover possibly not working
Nathan Rutman
Nathan.Rutman at Sun.COM
Tue Nov 6 10:46:42 PST 2007
Jeremy Mann wrote:
> I have set up 20 compute nodes as OSTs, one off each other like
> compute-0-0 -> 0-1, 0-2 -> 0-3 and so on. However this morning, one of
> the drives in a OST failed. The node didn't reboot, it just remounted
> its lustre OST device read-only. This caused our normal storage scripts
> to fail.
>
You could mount your devices errors=panic to panic the node instead of
remounting RO, thus
giving your HA scripts something more useful to work with.
> I had to reboot the node anyway to replace the drive, so that's when the
> failover to the next node happened. I can see on the Meta server that
> Lustre did indeed switch to the failover node, however, the files that
> were associated with that node are visible but not readable. Shouldn't
> the failover node have prevented this?
>
The files are visible because the namespace is contained on the MDT, not
the individual OSTs.
All files will be visible; files on the affected OST will be inaccessible.
> The drive that failed is completely dead, I can't even mount it to try a
> dd to restore the filesystem, so it looks like I'm going to have to
> rebuild the filesystem.
>
A disk failure is considered an unrecoverable error as far as Lustre is
concerned. Your back-end storage must
be reliable for Lustre to function -- that's what raid is for.
Dual-ported standalone raid boxes allow for failover
Lustre servers to take over from each other in case of _node_ failure,
not _disk_ failure.
In the meantime, you can deactivate the affected OST using lctl on the
clients and MDT; this will allow access functions to complete
without errors (the files on the affected OST will be 0-length, but the
rest of your files will be ok)
More information about the lustre-discuss
mailing list