[lustre-discuss] Laying the groundwork for a new lustre file system

Kurt Strosahl strosahl at jlab.org
Fri Nov 9 07:28:08 PST 2018


All,


    I asked this question a month or so back, but thought I'd ask it again to see if anyone else had insight.  I'm in the process of designing a new lustre file system to replace an existing 2.3PB lustre file system.  We have over a thousand clients connecting to lustre over mostly QDR IB, as well as two clusters on omnipath that connect through four lnet routers.  Presently we are running 2.5.42, we have a metadata system that runs on ldsik that presently has 52GB of space used (164.0M inodes) and has 64GB of memory (the system is from 2014).  It is a system with two identical heads attached to a disk shelf, configured with one head as the active head and one as the standby.  I've been told to plan for a system that is 4x the size of the current one.  We are looking at Lustre 2.10 or 2.12 (favoring 2.12 to allow data on the MDT).


With that background, here is what I've found from the last time I asked this (and from my reading online).


1) For the CPU on the MDT system.  Faster cores + cache on CPU is more important then more cores, but the system should have at least 16 cores.

2) ZFS is ready for use on the MDT (assuming zfs 0.7.x.x)

3) I'm going to recommend at least 128GB of memory, but that 256GB would be better.

4) for the MDT itself striping across mirrors provides superior speed over raidz2.  SSDs will be used.

For fault tolerance the new system will have two heads, with the MDT/MGS failing over between the two of them... which opens up the possibility of splitting the MDT into two MDTs, then having each MDT mounted on a head.  This was mentioned the last time I asked about this, and spreads the MDT load between two systems.  If we do that I assume that I'll have to set the striping on the first directories created, and then further directories will inherit that striping.


w/r,

Kurt J. Strosahl
System Administrator: Lustre, HPC
Scientific Computing Group, Thomas Jefferson National Accelerator Facility
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20181109/d9665e30/attachment.html>


More information about the lustre-discuss mailing list