[Lustre-discuss] Multi-Role/Tasking MDS/OSS Hosts
bs_lists at aakef.fastmail.fm
Fri Sep 17 12:48:53 PDT 2010
On Friday, September 17, 2010, Andreas Dilger wrote:
> On 2010-09-17, at 12:42, Jonathan B. Horen wrote:
> > We're trying to architect a Lustre setup for our group, and want to
> > leverage our available resources. In doing so, we've come to consider
> > multi-purposing several hosts, so that they'll function simultaneously
> > as MDS & OSS.
> You can't do this and expect recovery to work in a robust manner. The
> reason is that the MDS is a client of the OSS, and if they are both on the
> same node that crashes, the OSS will wait for the MDS "client" to
> reconnect and will time out recovery of the real clients.
Well, that is some kind of design problem. Even on separate nodes it can
easily happen, that both MDS and OSS fail, for example power outage of the
storage rack. In my experience situations like that happen frequently...
I think some kind a pre-connection would be required, where a client can tell
a server, that it was rebooted and that the server shall not to wait any
longer for it. Actually, shouldn't be that difficult, as already different
connection flags exist. So if the client contacts a server and ask for an
initial connection, the server could check for that NID and then immediately
abort recovery for that client.
More information about the lustre-discuss