<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.apple-converted-space
{mso-style-name:apple-converted-space;}
span.EmailStyle19
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:windowtext;}
span.EmailStyle20
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:windowtext;}
span.EmailStyle21
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:windowtext;}
span.EmailStyle22
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal">Thanks Andreas!<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Comparing “zpool get all” on both systems I found ashift is 0 on both systems – but a number of features are different on the “bad” mdt. Except for “extensible dataset” they are all enabled on the “bad” one. Could one of them be the problem?
Is it possible to change those features and reclaim the space?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">“good system”<o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">- mdt0000 feature@multi_vdev_crash_dump disabled local<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">- mdt0000 feature@extensible_dataset enabled local<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">- mdt0000 feature@filesystem_limits disabled local<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">- mdt0000 feature@large_blocks disabled local<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">- mdt0000 feature@large_dnode disabled local<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">- mdt0000 feature@sha512 disabled local<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">- mdt0000 feature@skein disabled local<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">- mdt0000 feature@edonr disabled local<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">- mdt0000 feature@userobj_accounting disabled local<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">“Bad” system<o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">+ mdt0000 feature@multi_vdev_crash_dump enabled local<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">+ mdt0000 feature@extensible_dataset active local<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">+ mdt0000 feature@filesystem_limits enabled local<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">+ mdt0000 feature@large_blocks enabled local<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">+ mdt0000 feature@large_dnode enabled local<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">+ mdt0000 feature@sha512 enabled local<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">+ mdt0000 feature@skein enabled local<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">+ mdt0000 feature@edonr enabled local<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">+ mdt0000 feature@userobj_accounting active local<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><a name="_____replyseparator"></a><b>From:</b> Andreas Dilger <adilger@whamcloud.com>
<br>
<b>Sent:</b> Monday, November 11, 2019 15:55<br>
<b>To:</b> Hebenstreit, Michael <michael.hebenstreit@intel.com><br>
<b>Cc:</b> Mohr Jr, Richard Frank <rmohr@utk.edu>; lustre-discuss@lists.lustre.org<br>
<b>Subject:</b> Re: [lustre-discuss] changing inode size on MDT<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">You can check the ashift of the zpool via "zpool get all | grep ashift". If this is different, it will make a huge difference in space usage. There are a number of ZFS articles that discuss this, it isn't specific to Lustre. <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<p class="MsoNormal">Also, RAID-Z2 is going to have much more space overhead for the MDT than mirroring, because the MDT is almost entirely small blocks. Normally the MDT is using mirrored VDEVs.
<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">The reason is that RAID-Z2 has two parity sectors per data stripe vs. a single extra mirror per data block, so if all data blocks are 4KB that would double the parity overhead vs. mirroring. Secondly, depending on the geometry, RAID-Z2
needs padding sectors to align the variable RAID-Z stripes, which mirrors do not.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt">For large files/blocks RAID-Z2 is better, but that isn't the workload on the MDT unless you are storing DoM files there (eg. 64KB or larger). <o:p></o:p></p>
<div id="AppleMailSignature">
<p class="MsoNormal">Cheers, Andreas<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
On Nov 11, 2019, at 13:48, Hebenstreit, Michael <<a href="mailto:michael.hebenstreit@intel.com">michael.hebenstreit@intel.com</a>> wrote:<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal">Recordsize/ahift: in both cases default values were used (but on different versions of Lustre). How can I check different values for recordsize/ashift for actual values to compare?<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">zpool mirroring is quite different though – bad drive is a simple raidz2:<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> raidz2-0 ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sdd ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal">….<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">errors: No known data errors<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">the good drive uses 10 mirrors:<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> NAME STATE READ WRITE CKSUM</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> mdt0000 ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> mirror-0 ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sdd ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sde ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> mirror-1 ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sdf ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sdg ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> mirror-2 ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sdh ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sdi ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> mirror-3 ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sdj ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sdk ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> mirror-4 ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sdl ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sdm ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> mirror-5 ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sdn ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sdo ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> mirror-6 ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sdp ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sdq ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> mirror-7 ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sdr ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sds ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> mirror-8 ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sdt ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sdu ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> mirror-9 ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sdv ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sdw ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> mirror-10 ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sdx ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> sdy ONLINE 0 0 0</span><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">thanks<o:p></o:p></p>
<p class="MsoNormal">Michael<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Andreas Dilger <<a href="mailto:adilger@whamcloud.com">adilger@whamcloud.com</a>>
<br>
<b>Sent:</b> Monday, November 11, 2019 14:42<br>
<b>To:</b> Hebenstreit, Michael <<a href="mailto:michael.hebenstreit@intel.com">michael.hebenstreit@intel.com</a>><br>
<b>Cc:</b> Mohr Jr, Richard Frank <<a href="mailto:rmohr@utk.edu">rmohr@utk.edu</a>>;
<a href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a><br>
<b>Subject:</b> Re: [lustre-discuss] changing inode size on MDT<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">There isn't really enough information to make any kind of real analysis.
<o:p></o:p></p>
<div>
<p class="MsoNormal"><br>
<br>
<br>
<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">My guess would be that you are using a larger ZFS recordsize or ashift on the new filesystem, or the RAID config is different?<o:p></o:p></p>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<div id="AppleMailSignature">
<p class="MsoNormal">Cheers, Andreas<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
On Nov 7, 2019, at 08:45, Hebenstreit, Michael <<a href="mailto:michael.hebenstreit@intel.com">michael.hebenstreit@intel.com</a>> wrote:<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal">So we went ahead and used the FS – using rsync to duplicate the existing FS. The inodes available on the NEW mdt (which is almost twice the size of the second mdt) are dropping rapidly and are now LESS than on the smaller mdt (even though
the sync is only 90% complete). Both FS are running almost identical Lustre 2.10. I cannot say anymore which ZFS version was used to format the good FS.
<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Any ideas why those 2 MDTs behave so differently?<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">old GOOD FS:<o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""># df -i </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">mgt/mgt 81718714 205 81718509 1% /lfs/lfsarc02/mgt</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">mdt0000/mdt0000 458995000 130510339 328484661 29% /lfs/lfsarc02/mdt</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""># df -h</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">mgt/mgt 427G 7.0M 427G 1% /lfs/lfsarc02/mgt</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">mdt0000/mdt0000 4.6T 1.4T 3.3T 29% /lfs/lfsarc02/mdt</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""># rpm -q -a | grep zfs</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">libzfs2-0.7.9-1.el7.x86_64</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">lustre-osd-zfs-mount-2.10.4-1.el7.x86_64</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">lustre-zfs-dkms-2.10.4-1.el7.noarch</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">zfs-0.7.9-1.el7.x86_64</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">zfs-dkms-0.7.9-1.el7.noarch</span><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">new BAD FS<o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""># df -ih </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">mgt/mgt 83M 169 83M 1% /lfs/lfsarc01/mgt</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">mdt0000/mdt0000 297M 122M 175M 42% /lfs/lfsarc01/mdt</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""># df -h </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">mgt/mgt 427G 5.8M 427G 1% /lfs/lfsarc01/mgt</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">mdt0000/mdt0000 8.2T 3.4T 4.9T 41% /lfs/lfsarc01/mdt</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""># rpm -q -a | grep zfs</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">libzfs2-0.7.9-1.el7.x86_64</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">lustre-osd-zfs-mount-2.10.8-1.el7.x86_64</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">lustre-zfs-dkms-2.10.8-1.el7.noarch</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">zfs-0.7.9-1.el7.x86_64</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">zfs-dkms-0.7.9-1.el7.noarch</span><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Andreas Dilger <<a href="mailto:adilger@whamcloud.com">adilger@whamcloud.com</a>>
<br>
<b>Sent:</b> Thursday, October 03, 2019 20:38<br>
<b>To:</b> Hebenstreit, Michael <<a href="mailto:michael.hebenstreit@intel.com">michael.hebenstreit@intel.com</a>><br>
<b>Cc:</b> Mohr Jr, Richard Frank <<a href="mailto:rmohr@utk.edu">rmohr@utk.edu</a>>;
<a href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a><br>
<b>Subject:</b> Re: [lustre-discuss] changing inode size on MDT<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">On Oct 3, 2019, at 20:09, Hebenstreit, Michael <<a href="mailto:michael.hebenstreit@intel.com">michael.hebenstreit@intel.com</a>> wrote:<o:p></o:p></p>
<div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"> <o:p></o:p></p>
<div>
<div>
<p class="MsoNormal">So bottom line – don’t change the default values, it won’t get better?<o:p></o:p></p>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<p class="MsoNormal"><span style="font-size:10.5pt">Like I wrote previously, there *are* no default/tunable values to change for ZFS. The tunables are only for ldiskfs, which statically allocates everything, but is will cause problems if you guessed incorrectly
at the instant you format the filesystem.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt">The number reported by raw ZFS and by Lustre-on-ZFS is just an estimate, and you will (essentially) run out of inodes once you run out of space on the MDT or all OSTs. And I didn't say "it won't get better",
actually I said the estimate _will_ get better once you actually start using the filesystem.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt">If the (my estimate) 2-3B inodes on the MDT is insufficient, you can always add another (presumably mirrored) VDEV to the MDT, or add a new MDT to the filesystem to increase the number of inodes available.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt">Cheers, Andreas</span><o:p></o:p></p>
</div>
<div>
<div>
<div>
<p class="MsoNormal"><br>
<br>
<br>
<br>
<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<div>
<p class="MsoNormal"><b>From:</b><span class="apple-converted-space"> </span>Andreas Dilger <<a href="mailto:adilger@whamcloud.com"><span style="color:purple">adilger@whamcloud.com</span></a>><span class="apple-converted-space"> </span><br>
<b>Sent:</b><span class="apple-converted-space"> </span>Thursday, October 03, 2019 19:38<br>
<b>To:</b><span class="apple-converted-space"> </span>Hebenstreit, Michael <<a href="mailto:michael.hebenstreit@intel.com"><span style="color:purple">michael.hebenstreit@intel.com</span></a>><br>
<b>Cc:</b><span class="apple-converted-space"> </span>Mohr Jr, Richard Frank <<a href="mailto:rmohr@utk.edu"><span style="color:purple">rmohr@utk.edu</span></a>>;<span class="apple-converted-space"> </span><a href="mailto:lustre-discuss@lists.lustre.org"><span style="color:purple">lustre-discuss@lists.lustre.org</span></a><br>
<b>Subject:</b><span class="apple-converted-space"> </span>Re: [lustre-discuss] changing inode size on MDT<o:p></o:p></p>
</div>
</div>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">On Oct 3, 2019, at 05:03, Hebenstreit, Michael <<a href="mailto:michael.hebenstreit@intel.com"><span style="color:purple">michael.hebenstreit@intel.com</span></a>> wrote:<o:p></o:p></p>
</div>
<div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<div>
<div>
<p class="MsoNormal">So you are saying on a zfs based Lustre there is no way to increase the number of available inodes? I have 8TB MDT with roughly 17G inodes<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New"">[root@elfsa1m1 ~]# df -h</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New"">Filesystem Size Used Avail Use% Mounted on</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New"">mdt0000 8.3T 256K 8.3T 1% /mdt0000</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New""> </span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New"">[root@elfsa1m1 ~]# df -i</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New"">Filesystem Inodes IUsed IFree IUse% Mounted on</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New"">mdt0000 17678817874 6 17678817868 1% /mdt0000</span><o:p></o:p></p>
</div>
</div>
</div>
</blockquote>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt">For ZFS the only way to increase inodes on the *MDT* is to increase the size of the MDT, though more on that below. Note that the "number of inodes" reported by ZFS is an estimate based on the currently-allocated
blocks and inodes (i.e. bytes_per_inode_ratio = bytes_used / inodes_used, total inode estimate = bytes_free / inode_ratio + inodes_used), which becomes more accurate as the MDT becomes more full. With 17B inodes on a 8TB MDT that is an bytes-per-inode ratio
of 497, which is unrealistically low for Lustre since the MDT will always stores multiple xattrs on each inode. Note that the filesystem only has 6 inodes allocated, so the ZFS total inodes estimate is unrealistically high and will get better as more inodes
are allocated in the filesystem.</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<div>
<div>
<p class="MsoNormal">Formating under Lustre 2.10.8<span class="apple-converted-space"> </span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New"">mkfs.lustre --mdt --backfstype=zfs --fsname=lfsarc01 --index=0 --mgsnid="<a href="mailto:36.101.92.22@tcp"><span style="color:purple">36.101.92.22@tcp</span></a>" --reformat mdt0000/mdt0000</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">this translates to only 948M inodes on the Lustre FS.<span class="apple-converted-space"> </span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New"">[root@elfsa1m1 ~]# df -i</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New"">Filesystem Inodes IUsed IFree IUse% Mounted on</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New"">mdt0000 17678817874 6 17678817868 1% /mdt0000</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New"">mdt0000/mdt0000 948016092 263 948015829 1% /lfs/lfsarc01/mdt</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New""> </span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New"">[root@elfsa1m1 ~]# df -h</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New"">Filesystem Size Used Avail Use% Mounted on</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New"">mdt0000 8.3T 256K 8.3T 1% /mdt0000</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New"">mdt0000/mdt0000 8.2T 24M 8.2T 1% /lfs/lfsarc01/mdt</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">and there is no reasonable option to provide more file entries except for adding another MDT?<o:p></o:p></p>
</div>
</div>
</div>
</blockquote>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt">The Lustre statfs code will weight in some initial estimates for the bytes-per-inode ratio when computing the total inode estimate for the filesystem. When the filesystem is nearly empty, as is the case here,
then those initial estimates will dominate, but once you've allocated a few thousand inodes in the filesystem the actual values will dominate and you will have a much more accurate number for the total inode count. This will probably be more in the range
of 2B-4B inodes in the end, unless you also use Data-on-MDT (Lustre 2.11 and later) to store small files directly on the MDT.</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt">You've also excluded the OST lines from the above output? For the Lustre filesystem you (typically) also need at least one OST inode (object) for each file in the filesystem, possibly more than one, so "df"
of the Lustre filesystem may also be limited by the number of inodes reported by the OSTs (which may themselves depend on the average bytes-per-inode for files stored on the OST). If you use Data-on-MDT and only have a small files, then no OST object is needed
for small files, but you consume correspondingly more space on the MDT.</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt">Cheers, Andreas</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<div>
<div>
<p class="MsoNormal"><b>From:</b><span class="apple-converted-space"> </span>Andreas Dilger <<a href="mailto:adilger@whamcloud.com"><span style="color:purple">adilger@whamcloud.com</span></a>><span class="apple-converted-space"> </span><br>
<b>Sent:</b><span class="apple-converted-space"> </span>Wednesday, October 02, 2019 18:49<br>
<b>To:</b><span class="apple-converted-space"> </span>Hebenstreit, Michael <<a href="mailto:michael.hebenstreit@intel.com"><span style="color:purple">michael.hebenstreit@intel.com</span></a>><br>
<b>Cc:</b><span class="apple-converted-space"> </span>Mohr Jr, Richard Frank <<a href="mailto:rmohr@utk.edu"><span style="color:purple">rmohr@utk.edu</span></a>>;<span class="apple-converted-space"> </span><a href="mailto:lustre-discuss@lists.lustre.org"><span style="color:purple">lustre-discuss@lists.lustre.org</span></a><br>
<b>Subject:</b><span class="apple-converted-space"> </span>Re: [lustre-discuss] changing inode size on MDT<o:p></o:p></p>
</div>
</div>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal">There are several confusing/misleading comments on this thread that need to be clarified...<o:p></o:p></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
</div>
<div>
<div>
<p class="MsoNormal">On Oct 2, 2019, at 13:45, Hebenstreit, Michael <<a href="mailto:michael.hebenstreit@intel.com"><span style="color:purple">michael.hebenstreit@intel.com</span></a>> wrote:<o:p></o:p></p>
</div>
</div>
<div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<div>
<div>
<p class="MsoNormal"><a href="http://wiki.lustre.org/Lustre_Tuning#Number_of_Inodes_for_MDS"><span style="color:purple">http://wiki.lustre.org/Lustre_Tuning#Number_of_Inodes_for_MDS</span></a><o:p></o:p></p>
</div>
</div>
</div>
</div>
</blockquote>
<div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
</div>
<div>
<div>
<p class="MsoNormal">Note that I've updated this page to reflect current defaults. The Lustre Operations Manual has a much better description of these parameters.<o:p></o:p></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
</div>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<div>
<div>
<div>
<p class="MsoNormal">and I'd like to use --mkfsoptions='-i 1024' to have more inodes in the MDT. We already run out of inodes on that FS (probably due to an ZFS bug in early IEEL version) - so I'd like to increase #inodes if possible.<o:p></o:p></p>
</div>
</div>
</div>
</div>
</blockquote>
<div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
</div>
<div>
<div>
<p class="MsoNormal">The "-i 1024" option (bytes-per-inode ratio) is only needed for ldiskfs since it statically allocates the inodes at mkfs time, it is not relevant for ZFS since ZFS dynamically allocates inodes and blocks as needed.<o:p></o:p></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<div>
<div>
<p class="MsoNormal">On Oct 2, 2019, at 14:00, Colin Faber <<a href="mailto:cfaber@gmail.com"><span style="color:purple">cfaber@gmail.com</span></a>> wrote:<o:p></o:p></p>
</div>
</div>
</div>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<div>
<div>
<div>
<div>
<p class="MsoNormal">With 1K inodes you won't have space to accommodate new features, IIRC the current minimal limit on modern lustre is 2K now. If you're running out of MDT space you might consider DNE and multiple MDT's to accommodate that larger name space.<o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal">To clarify, since Lustre 2.10 any new ldiskfs MDT will allocate 1024 bytes for the inode itself (-I 1024). That allows enough space *within* the inode to efficiently store xattrs for more complex layouts (PFL, FLR, DoM). If xattrs do
not fit inside the inode itself then they will be stored in an external 4KB inode block.<o:p></o:p></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal">The MDT is formatted with a bytes-per-inode *ratio* of 2.5KB, which means (approximately) one inode will be created for every 2.5kB of the total MDT size. That 2.5KB of space includes the 1KB for the inode itself, plus space for a directory
entry (or multiple if hard-linked), extra xattrs, the journal (up to 4GB for large MDTs), Lustre recovery logs, ChangeLogs, etc. Each directory inode will have at least one 4KB block allocated.<o:p></o:p></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal">So, it is _possible_ to reduce the inode *ratio* below 2.5KB if you know what you are doing (e.g. 2KB/inode or 1.5KB/inode, this can be an arbitrary number of bytes, it doesn't have to be an even multiple of anything) but it definitely
isn't possible to have 1KB inode size and 1KB per inode ratio, as there wouldn't be *any* space left for directories, log files, journal, etc.<o:p></o:p></p>
</div>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<p class="MsoNormal">Cheers, Andreas<o:p></o:p></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal">--<o:p></o:p></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal">Andreas Dilger<o:p></o:p></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal">Principal Lustre Architect<o:p></o:p></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal">Whamcloud<o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<p class="MsoNormal">Cheers, Andreas<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">--<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">Andreas Dilger<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">Principal Lustre Architect<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">Whamcloud<o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<p class="MsoNormal"><span style="color:black">Cheers, Andreas</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="color:black">--</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="color:black">Andreas Dilger</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="color:black">Principal Lustre Architect</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="color:black">Whamcloud</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="color:black"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="color:black"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="color:black"> </span><o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"> <o:p></o:p></p>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</body>
</html>