[Lustre-discuss] Multi-Role/Tasking MDS/OSS Hosts
spitzcor at cray.com
Fri Sep 17 14:31:27 PDT 2010
On 09/17/2010 02:48 PM, Bernd Schubert wrote:
> On Friday, September 17, 2010, Andreas Dilger wrote:
>> On 2010-09-17, at 12:42, Jonathan B. Horen wrote:
>>> We're trying to architect a Lustre setup for our group, and want to
>>> leverage our available resources. In doing so, we've come to consider
>>> multi-purposing several hosts, so that they'll function simultaneously
>>> as MDS & OSS.
>> You can't do this and expect recovery to work in a robust manner. The
>> reason is that the MDS is a client of the OSS, and if they are both on the
>> same node that crashes, the OSS will wait for the MDS "client" to
>> reconnect and will time out recovery of the real clients.
> Well, that is some kind of design problem. Even on separate nodes it can
> easily happen, that both MDS and OSS fail, for example power outage of the
> storage rack. In my experience situations like that happen frequently...
I think that just argues that the MDS should be on a separate UPS.
> I think some kind a pre-connection would be required, where a client can tell
> a server, that it was rebooted and that the server shall not to wait any
> longer for it. Actually, shouldn't be that difficult, as already different
> connection flags exist. So if the client contacts a server and ask for an
> initial connection, the server could check for that NID and then immediately
> abort recovery for that client.
More information about the lustre-discuss