[Lustre-discuss] Failover possibly not working

Jeremy Mann jeremy at biochem.uthscsa.edu
Tue Nov 6 09:25:46 PST 2007


I have set up 20 compute nodes as OSTs, one off each other like 
compute-0-0 -> 0-1, 0-2 -> 0-3 and so on. However this morning, one of 
the drives in a OST failed. The node didn't reboot, it just remounted 
its lustre OST device read-only. This caused our normal storage scripts 
to fail.

I had to reboot the node anyway to replace the drive, so that's when the 
failover to the next node happened. I can see on the Meta server that 
Lustre did indeed switch to the failover node, however, the files that 
were associated with that node are visible but not readable. Shouldn't 
the failover node have prevented this?

The drive that failed is completely dead, I can't even mount it to try a 
dd to restore the filesystem, so it looks like I'm going to have to 
rebuild the filesystem.


-- 
Jeremy Mann
jeremy at biochem.uthscsa.edu
University of Texas Health Science Center
Bioinformatics Core Facility
(210) 567-2672




More information about the lustre-discuss mailing list