[Lustre-discuss] question about size on MDS (MDT) for lustre-1.8

Robin Humble robin.humble+lustre at anu.edu.au
Wed Jan 12 16:45:52 PST 2011

Hi Nathan,

On Thu, Jan 06, 2011 at 05:42:24PM -0700, Nathan.Dauchy at noaa.gov wrote:
>I am looking for more information regarding the "size on MDS" feature as
>it exists for lustre-1.8.x.  Testing on our system (which started out as
>1.6.6 and is now 1.8.x) indicates that there are many files which do not
>have the size information stored on the MDT.  So, my basic question:
>under what conditions will the "size hint" attribute be updated?  Is
>there any way to force the MDT to query the OSTs and update it's

atime (and the MDT size hint) wasn't being updated for most of the 1.8
series due to this bug:
the atime fix is now in 1.8.5, but I'm not sure if anyone has verified
whether or not the MDT size hint is now behaving as originally intended.

actually, it was never clear to me what (if anything?) ever accessed
does someone have a hacked 'lfs find' or similar tool?
your approach of mounting and searching a MDT snapshot should be
possible, but it would seem neater just to have a tool on a client send
the right rpc's to the MDS and get the information that way.

like you, we are finding that the timescales for our filesystem
trawling scripts are getting out of hand, mostly (we think) due to
retrieving size information from very busy OSTs. a tool that only hit
the MDT and found (filename, uid, gid, approx size) should help a lot.
so +1 on this topic.

BTW, once you have 1.8.5 on the MDS, then a hack to populate the MDT
size hints might be to read 4k from every file in the system. that
should update atime and the size hint. please let us know if this works.

>The end goal of this is to facilitate efficient checks of disk usage on
>a per-directory basis (essentially we want "volume based quotas").  I'm

a possible approach for your situation would be to chgrp every file
under a directory to be the same gid, and then enable (un-enforcing)
group quotas on your filesystem. then you wouldn't have to search any
directories. you would still have to find and chgrp some files nightly,
but 'lfs find' should make that relatively quick.

unfortunately we also need a breakdown of the uid information in each
directory, so this approach isn't sufficient for us.

Dr Robin Humble, HPC Systems Analyst, NCI National Facility

>hoping to run something once a day on the MDS like the following:
>    lvcreate  -s -p r -n mdt_snap /dev/mdt
>    mount -t ldiskfs -o ro /dev/mdt_snap /mnt/snap
>    cd /mnt/snap/ROOT
>    du --apparent-size ./* > volume_usage.log
>    cd /
>    umount /mnt/snap
>    lvremove /dev/mdt_snap
>Since the data is going to be up to one day old anyway, I don't really
>mind that the file size is "approximate", but it does have to be
>reasonably close.
>With the MDT LVM snapshot method I can check the whole 300TB file system
>in about 3 hours, whereas checking from a client takes weeks.
>Here is why I am relatively certain that the size-on-MDS attributes are
>not updated (lightly edited):
>[root at mds0 ~]# ls -l /mnt/snap/ROOT/test/rollover/user_acct_file
>-rw-r--r-- 1 9999 9000 0 Mar 23  2010
>[root at mds0 ~]# du /mnt/snap/ROOT/test/rollover/user_acct_file
>0       /mnt/snap/ROOT/test/rollover/user_acct_file
>[root at mds0 ~]# du --apparent-size
>0       /mnt/snap/ROOT/test/rollover/user_acct_file
>[root at c448 ~]# ls -l /mnt/lfs0/test/rollover/user_acct_file
>-rw-r--r-- 1 user group 184435207 Mar 23  2010
>[root at c448 ~]# du /mnt/lfs0/test/rollover/user_acct_file
>180120  /mnt/lfs0/test/rollover/user_acct_file
>[root at c448 ~]# du --apparent-size /mnt/lfs0/test/rollover/user_acct_file
>180113  /mnt/lfs0/test/rollover/user_acct_file
>Thanks very much for any answers or suggestions you can provide!
>Lustre-discuss mailing list
>Lustre-discuss at lists.lustre.org

More information about the lustre-discuss mailing list