[Lustre-discuss] Multi-Role/Tasking MDS/OSS Hosts

Cory Spitz spitzcor at cray.com
Fri Sep 17 14:31:27 PDT 2010


Hi, Bernd.

On 09/17/2010 02:48 PM, Bernd Schubert wrote:
> On Friday, September 17, 2010, Andreas Dilger wrote:
>> On 2010-09-17, at 12:42, Jonathan B. Horen wrote:
>>> We're trying to architect a Lustre setup for our group, and want to
>>> leverage our available resources. In doing so, we've come to consider
>>> multi-purposing several hosts, so that they'll function simultaneously
>>> as MDS & OSS.
>>
>> You can't do this and expect recovery to work in a robust manner.  The
>> reason is that the MDS is a client of the OSS, and if they are both on the
>> same node that crashes, the OSS will wait for the MDS "client" to
>> reconnect and will time out recovery of the real clients.
> 
> Well, that is some kind of design problem. Even on separate nodes it can 
> easily happen, that both MDS and OSS fail, for example power outage of the 
> storage rack. In my experience situations like that happen frequently...
> 

I think that just argues that the MDS should be on a separate UPS.

> I think some kind a pre-connection would be required, where a client can tell 
> a server, that it was rebooted and that the server shall not to wait any 
> longer for it. Actually, shouldn't be that difficult, as already different 
> connection flags exist. So if the client contacts a server and ask for an 
> initial connection, the server could check for that NID and then immediately 
> abort recovery for that client.
> 
> 
> Cheers,
> Bernd
> 
> 



More information about the lustre-discuss mailing list