[lustre-discuss] MDT 100% full

Alexander I Kulyavtsev aik at fnal.gov
Tue Jul 26 19:11:53 PDT 2016


Brian,
Do you have zfs 'frozen' ? It can lock when you have zero bytes left and you can not do much with zfs after.
You will need to remove file on zfs itself or remove snapshot to unfreeze zfs, so do not wait until it filled completely.
To avoid deleting mdt objects after zfs locking, I create few zfs files with known names in extra zfs system on the same pool, and also do space reservation.

I had had large space consumed per inode on test system last summer, about 10KB per inode.
I formatted ssds with ashift=12. I thought zpool ashift=12 is better fit for 4K sector size SSD.

The issue was resolved when I set ashift=9 (512 sector emulated), and at the same time I set xattr=sa (it was not default then).

Right now I have 82M files taking 126 GB on production system (with thousands of snapshots), or 1.5 KB per "inode." With ashift=12 it was about 10KB IIRC. This can be reproduced with "createmany."

Here is the plot I got during tests with ashift=9.
You can see the bin with n=11 is heavy populated (2**11=2048). On similar plot for zpool formatted with ashift=12 the distribution starts at n=12.
That is a lot of objects with size <= 2048 will be created as 4K objects for zpool with ashift=12 leading to larger space consumed on lustre MDT  per lustre file (OI objects, hidden attributes,...)


zdb -M zpl


        vdev          0         metaslabs  119          fragmentation  5%

                          9:  21553 *

                         10: 186445 ******

                         11: 1337626 ****************************************

                         12: 603282 *******************

                         13: 235246 ********

                         14: 104615 ****

                         15:  50365 **

                         16:  22656 *

                         17:   9997 *

                         18:   3482 *

                         19:   1198 *

                         20:    481 *

Alex.


On Jul 26, 2016, at 7:57 PM, Rick Wagner <rpwagner at sdsc.edu<mailto:rpwagner at sdsc.edu>> wrote:

Hi Brian,

On Jul 26, 2016, at 5:45 PM, Andrus, Brian Contractor <bdandrus at nps.edu<mailto:bdandrus at nps.edu>> wrote:

All,

Ok, I thought 100GB would be sufficient for an MDT.
I have 2 MDTs as well, BUT…

MDT0 is 100% full and now I cannot write anything to my lustre filesystem.
The MDT is on a ZFS backing filesystem.

So, what is the proper way to grow my MDT using ZFS? Do I need to shut the filesystem down completely? Can I just add a disk or space to the pool and Lustre will see it?

Any advice or direction is appreciated.

We just did this successfully on the two MDTs backing one of our Lustre file systems and everything happened at the ZFS layer. We added drives to the pool and Lustre immediately saw the additional capacity. Whether you take down the file system or do it live is a question of your architecture, skills, and confidence. Having test file system is also worthwhile to go over the steps.

--Rick




Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238


_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20160727/9be8d968/attachment-0001.htm>


More information about the lustre-discuss mailing list