[Lustre-discuss] Multi-Role/Tasking MDS/OSS Hosts

Bernd Schubert bs_lists at aakef.fastmail.fm
Fri Sep 17 12:48:53 PDT 2010


On Friday, September 17, 2010, Andreas Dilger wrote:
> On 2010-09-17, at 12:42, Jonathan B. Horen wrote:
> > We're trying to architect a Lustre setup for our group, and want to
> > leverage our available resources. In doing so, we've come to consider
> > multi-purposing several hosts, so that they'll function simultaneously
> > as MDS & OSS.
> 
> You can't do this and expect recovery to work in a robust manner.  The
> reason is that the MDS is a client of the OSS, and if they are both on the
> same node that crashes, the OSS will wait for the MDS "client" to
> reconnect and will time out recovery of the real clients.

Well, that is some kind of design problem. Even on separate nodes it can 
easily happen, that both MDS and OSS fail, for example power outage of the 
storage rack. In my experience situations like that happen frequently...

I think some kind a pre-connection would be required, where a client can tell 
a server, that it was rebooted and that the server shall not to wait any 
longer for it. Actually, shouldn't be that difficult, as already different 
connection flags exist. So if the client contacts a server and ask for an 
initial connection, the server could check for that NID and then immediately 
abort recovery for that client.


Cheers,
Bernd


-- 
Bernd Schubert
DataDirect Networks



More information about the lustre-discuss mailing list