[Lustre-discuss] OST redundancy between nodes?

Kevin Van Maren Kevin.Vanmaren at Sun.COM
Fri Jun 26 10:51:59 PDT 2009


OSS is the server.  It normally provides one or more OSTs.

OST failover is done by configuring multiple OSS nodes to be able to 
serve the same OST.  Only ONE OSS node may provide the OST at a time.

Failover is accomplished by the clients attempting to connect to each 
OSS node configured to serve the OST, until one of them responds with it 
active.


An OST can be moved back-and-forth between OSS nodes by umount/mount 
commands (assuming both servers can access the same disk!)

If an OST "fails", meaning that the underlying HW has failed (or the 
connection to the storage has failed -- one reason to use multipath IO), 
then Lustre will return IO errors to the application (although there is 
an RFE to not do that).  Normally what happens is the OSS _node_ fails, 
and the other node mounts the OST (typically done by using 
Linux-HA/Heartbeat).


MDS/MDT failover/configuration is similar.

Kevin



Carlos Santana wrote:
> Sorry, but may be I am confused between OSS and OST.
>
> On Fri, Jun 26, 2009 at 11:24 AM, Brian J. Murrell<Brian.Murrell at sun.com> wrote:
>   
>> On Fri, 2009-06-26 at 10:56 -0500, Carlos Santana wrote:
>>     
>>> I was wondering what will happen during OST failure
>>>  - if client is making some read/write operation
>>>       
>> Assuming the OST is configured for failover, the client will retry
>> anything that didn't get committed to disk before the OST failure.  It
>> will try with all available failover targets for the OST.
>>     
>
> Can OST(disk) be configured for failover like an OSS(server node)?
>
>   
>>> - if client requests read/write after OST fails
>>>       
>> Same as above.
>>
>>     
>>> When I made OSS unavailable the client waited/got delayed response
>>> till OSS connected back.
>>>       
>> Right.  That's failover.
>>
>>     
>>> I am not sure about OST failure though. Any
>>> clues?
>>>       
>> An OST fails if an OSS fails given that an OST is the disk in an OSS
>> (which is the node).
>>     
>
> I thought an OST(disk) can fail without OSS(server) being failed.
> And that's my question, what will happen in such scenario - while
> client is in read/write operation and client requesting read/write
> after the OST(disk) failure?
>
>   
>> b.
>>
>>     




More information about the lustre-discuss mailing list