[Lustre-discuss] MDS sizing question

Andreas Dilger adilger at sun.com
Sat Oct 25 03:35:39 PDT 2008


On Oct 23, 2008  08:33 -0400, Craig Prescott wrote:
> I am considering hardware requirements for an
> MDS to be paired with a 500TB Lustre filesystem.
> I have a question regarding the sizing guidelines
> described in the manual.
> 
> For an anticipated average file size of 1MB, the
> MDT size guideline from section 21.3.2 works out
> to 4TB.  For comparison, on our production 28TB
> Lustre filesystem, we have:

The 4kB/inode (and doubled) value is just a safe rule of thumb.

> # lfs df
> UUID                 1K-blocks      Used Available  Use% Mounted on
> ufhpc-MDT0000_UUID   213655168  17329964 196325204    8% 
> /ufhpc/scratch[MDT:0]
> ...
> filesystem summary:  29966190744 20728270148 9237920596   69% /ufhpc/scratch
> 
> # lfs df -i
> UUID                    Inodes     IUsed     IFree IUse% Mounted on
> ufhpc-MDT0000_UUID    61049728  21416476  39633252   35% 
> /ufhpc/scratch[MDT:0]
> ...
> filesystem summary:   61049728  21416476  39633252   35% /ufhpc/scratch

It does indeed appear that you have 1MB average file size, and are
about 4x over-provisioned on the MDS (you use about 1kB/inode
instead of 4kB/inode).

> My concern is that if I follow the guidelines, I would
> over-provision the MDS with space we would never use.
> 
> I understand the inodes are pre-allocated and won't show up in
> the "Used" column above.  Under what conditions would the actual
> MDT space get used more significantly?

If you suddenly get more small files for some reason your space can
disappear quickly - there are 60M inodes remaining, about 50% more
than you need when the fs is full.  The other thing that consumes
MDS space is extended attributes.  If you have lots of widely-striped
files, or SELinux labels or ACLs these may require an extra block on
the MDS.

In high-performance MDTs you end up adding disks just to improve the
performance, and get enough capacity for free.  The above system only
has a 200GB MDS, probably only 2 disks mirrored.

I'd say for the large system you could get by with a 2TB MDT (you
should specify "-i 1500" or so, to increase the inode count over
the default).  This is again only 2 or 4 disks (mirrored) in the end...

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list