[Lustre-discuss] Failover possibly not working
Jeremy Mann
jeremy at biochem.uthscsa.edu
Tue Nov 6 09:25:46 PST 2007
I have set up 20 compute nodes as OSTs, one off each other like
compute-0-0 -> 0-1, 0-2 -> 0-3 and so on. However this morning, one of
the drives in a OST failed. The node didn't reboot, it just remounted
its lustre OST device read-only. This caused our normal storage scripts
to fail.
I had to reboot the node anyway to replace the drive, so that's when the
failover to the next node happened. I can see on the Meta server that
Lustre did indeed switch to the failover node, however, the files that
were associated with that node are visible but not readable. Shouldn't
the failover node have prevented this?
The drive that failed is completely dead, I can't even mount it to try a
dd to restore the filesystem, so it looks like I'm going to have to
rebuild the filesystem.
--
Jeremy Mann
jeremy at biochem.uthscsa.edu
University of Texas Health Science Center
Bioinformatics Core Facility
(210) 567-2672
More information about the lustre-discuss
mailing list