[lustre-discuss] Designing a new Lustre system

Wed Dec 20 08:21:18 PST 2017

Hi everyone,

We are currently looking into upgrading/replacing our Lustre system with a
newer system.

I had several ideas I'd like to run by you and also some questions:
1. After my recent experience with failover I wondered is there any reason
not to set all machines that are within reasonable cable range as potential
failover nodes so that in the very unlikely event of both machines
connected to a disk enclosure failing simple recabling + manual mount would
still work?

2. I'm trying to decide how to do metadata, on the one hand I would very
much like/prefer to have a failover pair, on the other hand when I look at
the load on the MDS it seems like a big waste to have even one machine
allocated to this exclusively, so I was thinking instead to maybe make all
Lustre nodes MDS+OSS, this would as I understand potentially provide better
metadata performance if needed and also allow me to put small files on the
MDS and also provide for better resilience. Am I correct in these
assumptions? Has anyone done something similar?

3. An LLNL lecture at Open-ZFS last year seems to strongly suggest using
zfs over ldiskfs,is this indeed 'the way to go for new systems' or are both
still fully valid options?

4. One of my colleagues likes Isilon very much, I have not been able to
find any literature on if/how Lustre compares any pointers/knowledge on the
subject is very welcome.

Our current system consists of 1 MDS + 3 OSS (15 OST), using FDR IB about
approx 500TB in size currently running Lustre 2.8 but I hope to upgrade it
to 2.10.x, the cluster it services consists of 72 nodes though we hope that
will grow more.
A new system would hopefully (budget dependent) be at least 1PB and still
be servicing the same/expanded cluster.

Thanks,
Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20171220/b13aa1b6/attachment.html>