[lustre-discuss] size of MDT, inode count, inode size

Dilger, Andreas andreas.dilger at intel.com
Fri Feb 2 18:45:07 PST 2018


On Jan 26, 2018, at 07:56, Thomas Roth <t.roth at gsi.de> wrote:
> 
> Hmm, option-testing leads to more confusion:
> 
> With this 922GB-sdb1 I do
> 
> mkfs.lustre --reformat --mgs --mdt ... /dev/sdb1
> 
> The output of the command says
> 
>   Permanent disk data:
> Target:     test0:MDT0000
> ...
> 
> device size = 944137MB
> formatting backing filesystem ldiskfs on /dev/sdb1
> 	target name   test0:MDT0000
> 	4k blocks     241699072
> 	options        -J size=4096 -I 1024 -i 2560 -q -O dirdata,uninit_bg,^extents,mmp,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F
> 
> mkfs_cmd = mke2fs -j -b 4096 -L test0:MDT0000  -J size=4096 -I 1024 -i 2560 -q -O dirdata,uninit_bg,^extents,mmp,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/sdb1 241699072

The default options have to be conservative, as we don't know in advance how a filesystem will be used.  It may be that some sites will have lots of hard links or long filenames (which consume directory space == blocks, but not inodes), or they will have widely-striped files (which also consume xattr blocks).  The 2KB/inode ratio includes the space for the inode itself (512B in 2.7.x 1024B in 2.10), at least one directory entry (~64 bytes), some fixed overhead for the journal (up to 4GB on the MDT), and Lustre-internal overhead (OI entry = ~64 bytes), ChangeLog, etc.

If you have a better idea of space usage at your site, you can specify different parameters.

> Mount this as ldiskfs, gives 369 M inodes.
> 
> One would assume that specifying one / some of the mke2fs-options here in the mkfs.lustre-command will change nothing.
> 
> However,
> 
> mkfs.lustre --reformat --mgs --mdt ... --mkfsoptions="-I 1024" /dev/sdb1
> 
> says
> 
> device size = 944137MB
> formatting backing filesystem ldiskfs on /dev/sdb1
> 	target name   test0:MDT0000
> 	4k blocks     241699072
> 	options       -I 1024 -J size=4096 -i 1536 -q -O dirdata,uninit_bg,^extents,mmp,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F
> 
> mkfs_cmd = mke2fs -j -b 4096 -L test0:MDT0000 -I 1024 -J size=4096 -i 1536 -q -O dirdata,uninit_bg,^extents,mmp,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/sdb1 241699072
> 
> and the mounted devices now has 615 M inodes.
> 
> So, whatever makes the calculation for the "-i / bytes-per-inode" value becomes ineffective if I specify the inode size by hand?

This is a bit surprising.  I agree that specifying the same inode size value as the default should not affect the calculation for the bytes-per-inode ratio.

> How many bytes-per-inode do I need?
> 
> This ratio, is it what the manual specifies as "one inode created for each 2kB of LUN" ?

That was true with 512B inodes, but with the increase to 1024B inodes in 2.10 (to allow for PFL file layouts, since they are larger) the inode ratio has also gone up 512B to 2560B/inode.

> Perhaps the raw size of an MDT device should better be such that it leads to "-I 1024 -i 2048"?

Yes, that is probably reasonable, since the larger inode also means that there is less chance of external xattr blocks being allocated.

Note that with ZFS there is no need to specify the inode ratio at all.  It will dynamically allocate inode blocks as needed, along with directory blocks, OI tables, etc., until the filesystem is full.

Cheers, Andreas

> On 01/26/2018 03:10 PM, Thomas Roth wrote:
>> Hi all,
>> what is the relation between raw device size and size of a formatted MDT? Size of inodes + free space = raw size?
>> The example:
>> MDT device has 922 GB in /proc/partions.
>> Formatted under Lustre 2.5.3 with default values for mkfs.lustre resulted in a 'df -h' MDT of 692G and more importantly 462M inodes.
>> So, the space used for inodes + the 'df -h' output add up to the raw size:
>>  462M inodes * 0.5kB/inode + 692 GB = 922 GB
>> On that system there are now 330M files, more than 70% of the available inodes.
>> 'df -h' says '692G  191G  456G  30% /srv/mds0'
>> What do I need the remaining 450G for? (Or the ~400G left once all the inodes are eaten?)
>> Should the format command not be tuned towards more inodes?
>> Btw, on a Lustre 2.10.2 MDT I get 369M inodes and 550 G space (with a 922G raw device): inode size is now 1024.
>> However, according to the manual and various Jira/Ludocs the size should be 2k nowadays?
>> Actually, the command within mkfs.lustre reads
>> mke2fs -j -b 4096 -L test0:MDT0000  -J size=4096 -I 1024 -i 2560  -F /dev/sdb 241699072
>> -i 2560 ?
>> Cheers,
>> Thomas
> 
> -
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









More information about the lustre-discuss mailing list