[Lustre-discuss] Failover possibly not working

Tue Nov 6 10:46:42 PST 2007

Jeremy Mann wrote:
> I have set up 20 compute nodes as OSTs, one off each other like 
> compute-0-0 -> 0-1, 0-2 -> 0-3 and so on. However this morning, one of 
> the drives in a OST failed. The node didn't reboot, it just remounted 
> its lustre OST device read-only. This caused our normal storage scripts 
> to fail.
>   
You could mount your devices errors=panic to panic the node instead of 
remounting RO, thus
giving your HA scripts something more useful to work with.
> I had to reboot the node anyway to replace the drive, so that's when the 
> failover to the next node happened. I can see on the Meta server that 
> Lustre did indeed switch to the failover node, however, the files that 
> were associated with that node are visible but not readable. Shouldn't 
> the failover node have prevented this?
>   
The files are visible because the namespace is contained on the MDT, not 
the individual OSTs.
All files will be visible; files on the affected OST will be inaccessible.
> The drive that failed is completely dead, I can't even mount it to try a 
> dd to restore the filesystem, so it looks like I'm going to have to 
> rebuild the filesystem.
>   
A disk failure is considered an unrecoverable error as far as Lustre is 
concerned.  Your back-end storage must
be reliable for Lustre to function -- that's what raid is for.  
Dual-ported standalone raid boxes allow for failover
Lustre servers to take over from each other in case of _node_ failure, 
not _disk_ failure.

In the meantime, you can deactivate the affected OST using lctl on the 
clients and MDT; this will allow access functions to complete
without errors (the files on the affected OST will be 0-length, but the 
rest of your files will be ok)