[Lustre-discuss] Multi-Role/Tasking MDS/OSS Hosts

Bernd Schubert bschubert at ddn.com
Fri Sep 17 14:49:55 PDT 2010


Hello Cory,

On 09/17/2010 11:31 PM, Cory Spitz wrote:
> Hi, Bernd.
> 
> On 09/17/2010 02:48 PM, Bernd Schubert wrote:
>> On Friday, September 17, 2010, Andreas Dilger wrote:
>>> On 2010-09-17, at 12:42, Jonathan B. Horen wrote:
>>>> We're trying to architect a Lustre setup for our group, and want to
>>>> leverage our available resources. In doing so, we've come to consider
>>>> multi-purposing several hosts, so that they'll function simultaneously
>>>> as MDS & OSS.
>>>
>>> You can't do this and expect recovery to work in a robust manner.  The
>>> reason is that the MDS is a client of the OSS, and if they are both on the
>>> same node that crashes, the OSS will wait for the MDS "client" to
>>> reconnect and will time out recovery of the real clients.
>>
>> Well, that is some kind of design problem. Even on separate nodes it can 
>> easily happen, that both MDS and OSS fail, for example power outage of the 
>> storage rack. In my experience situations like that happen frequently...
>>
> 
> I think that just argues that the MDS should be on a separate UPS.

well, there is not only a single reason. Next hardware issue is that
maybe an IB switch fails. And then have also seen cascading Lustre
failures. It starts with an LBUG on the OSS, which triggers another
problem on the MDS...
Also, for us this actually will become a real problem, which cannot be
easily solved. So this issue will become a DDN priority.


Cheers,
Bernd

--
Bernd Schubert
DataDirect Networks

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100917/0e068e2e/attachment.pgp>


More information about the lustre-discuss mailing list