[Lustre-discuss] manual ost failover problems

Marcus Schull c.schull at imb.uq.edu.au
Thu Aug 21 01:29:40 PDT 2008


Hi,

We are currently testing lustre 1.6.5.1 on RHEL 5 (64bit) with 3 OSTs  
for a 'data' filesystem running on server1 and 4 OSTs for a 'common'  
filesytem running on server2.  Each OST is a 1TB SAN LUN that can be  
seen from either server.  The idea was to run the servers as an  
active/active failover pair, being able to mount the 'other' LUNs on  
the remaining server if one server failed.   Also, we could have the  
flexibility of striping (between the 2 nodes initially -->  more in  
the future), if the OSTs of each fs  were spread out amongst the  
servers.

At present, this works well if all LUNs are only mounted on the  
initial server they are mounted on after creation.

I had assumed that OSTs could be unmounted from server1 and then  
remounted on then remounted on server2 (never simultaneously  
mounted), but this does not seem to work whether or not clients are  
using (have mounted) the file system, or even whether the servers are  
rebooted in between the change.

The filesystems were created using the --failnode option.

Even though the LUNs will mount on the other server, any clients that  
access the filesytem will 'hang' until the LUN is mounted back in its  
initial location.

Is there a command to 'update' the ?MGS/MDT's information regarding  
this, and so communicate this to the clients?

While I may have missed it, I couldn't find much information on  
'manual' failover in the Lustre 1.6 manual or lustre wiki.

We may implement failover with Linux HA down the track, but at this  
stage manual failover would be sufficient if we could understand more  
about how it works.

If this info is clearly documented somewhere (like in the manual), I  
apologise and will attempt to locate this info again.


  ** It seems that I can achieve the above with the MDTs (ie unmount  
from one server and mount on the other) - although with inconsistent  
results so far.


Thanks in advance for any advice.


Marcus.

Systems Administrator,
University of Queensland.





More information about the lustre-discuss mailing list