[lustre-discuss] Designing a new Lustre system

Patrick Farrell paf at cray.com
Wed Dec 20 10:21:07 PST 2017


I won’t try to answer all your questions (I’m not really qualified to opine), but a quick one on ZFS:

ZFS today is still much slower for the MDT.  It’s competitive on OSTs, arguably better, depending on your needs and hardware.  So a strong choice for a config today would be ldiskfs MDTs and ZFS OSTs, I know several places do that.

As for MDS+OSS in one node, probably the main problem you’ll face is memory usage.  The MDS and OSSes can both benefit from lots of RAM, depending on your workload and configuration.  So it might be hard to provide happily for both.

But combined MDS+OSS is certainly something people have been discussing recently, for the reasons you gave.  I don’t know if any real deployments exist (there are certainly test setups all over).

- Patrick

From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org<mailto:lustre-discuss-bounces at lists.lustre.org>> on behalf of "E.S. Rosenberg" <esr+lustre at mail.hebrew.edu<mailto:esr+lustre at mail.hebrew.edu>>
Date: Wednesday, December 20, 2017 at 10:21 AM
To: "lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>" <lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>>
Subject: [lustre-discuss] Designing a new Lustre system

Hi everyone,

We are currently looking into upgrading/replacing our Lustre system with a newer system.

I had several ideas I'd like to run by you and also some questions:
1. After my recent experience with failover I wondered is there any reason not to set all machines that are within reasonable cable range as potential failover nodes so that in the very unlikely event of both machines connected to a disk enclosure failing simple recabling + manual mount would still work?

2. I'm trying to decide how to do metadata, on the one hand I would very much like/prefer to have a failover pair, on the other hand when I look at the load on the MDS it seems like a big waste to have even one machine allocated to this exclusively, so I was thinking instead to maybe make all Lustre nodes MDS+OSS, this would as I understand potentially provide better metadata performance if needed and also allow me to put small files on the MDS and also provide for better resilience. Am I correct in these assumptions? Has anyone done something similar?

3. An LLNL lecture at Open-ZFS last year seems to strongly suggest using zfs over ldiskfs,is this indeed 'the way to go for new systems' or are both still fully valid options?

4. One of my colleagues likes Isilon very much, I have not been able to find any literature on if/how Lustre compares any pointers/knowledge on the subject is very welcome.

Our current system consists of 1 MDS + 3 OSS (15 OST), using FDR IB about approx 500TB in size currently running Lustre 2.8 but I hope to upgrade it to 2.10.x, the cluster it services consists of 72 nodes though we hope that will grow more.
A new system would hopefully (budget dependent) be at least 1PB and still be servicing the same/expanded cluster.

Thanks,
Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20171220/a4950cc0/attachment.html>


More information about the lustre-discuss mailing list