[Lustre-discuss] e2scan for cleaning scratch space
Daire Byrne
Daire.Byrne at framestore.com
Wed Mar 11 02:30:45 PDT 2009
Jim,
----- "Jim Garlick" <garlick at llnl.gov> wrote:
> E2scan seems like an evil kludge to me, at least if you don't quiesce
> your servers first, which is impractical for us to do. It is especially
> painful if you have to correlate data taken seperately from OST and MDT
> which I guess you need not do if a) you have a release with trustable MDS
> times (not 1.6.6), or b) you plan to stat(2) the MDS-generated list on a
> client before purging them.
For the purposes of clearing old data are m|ctimes of files on the MDT
filesystem really going to be that far out? We came across something similar
when we used to rsync between two MDTs and filed a "bug" here:
https://bugzilla.lustre.org/show_bug.cgi?id=14952
I thought that the MDT filesystem times would at least be close to the "real"
lustre filesystem time? Perhaps things are more complicated with striped files.
> See bug 16942 which describes an MDS resident "purge thread" that
> continually walks the file system implementing the policy. This is how
> we thought purging ought to be optimized and we hope to have this in place
> by the time we put 1.8 in production.
Interesting. I will keep an eye on that - cheers.
> Meanwhile, we are walking the file system from a client. Note that
> for this
>
> lfs find --type f --atime +60 --mtime +60 --ctime +60 /mnt/lustre >list
>
> beats
>
> find /mnt/lustre -type f -atime +60 -mtime +60 -ctime +60 >list
>
> by a wide margin since most of the time it does not have to contact
> the OST's, which stat(2) will always do for the foreseeable future
> (until size-on-MDS) to get st_size.
I did not think about this before - thanks. So the "accurate" times are held
in the EAs on the MDT? And the stat(2) is so slow because it wants file size
too which then needs to talk to the OSTs. Is there still going to be an
overhead on the MDS reading the EAs from disk compared to just stating the
files on the MDT device?
Some comparison benchmarks of one of our filesystems (36 million files) with
10GigE:
# e2scan -l -D -N 0 /dev/lustre/mdt1
~49 minutes
# e2scan -l -D -N `date --date="60 days ago" +%s` /dev/lustre/mdt1
~18 minutes
# lfs find /mnt/lustre
~403 minutes
# lfs find -atime +60 -mtime +60 -ctime +60 /mnt/lustre
~2520 minutes
# find /mnt/lustre
~100 minutes
# find /mnt/lustre -atime +60 -mtime +60 -ctime +60
~6574 minutes (4.5 days!)
The results are not 100% accurate because this was run on a production system
whose load varied throughout the day. I appreciate that "lfs find" is more
convenient (and accurate) than using e2scan for particular ctimes and mtimes
but there is still enough of a performance difference in our environment which
makes e2scan preferable. Running multiple (lfs) find commands across a compute
cluster does help speed things up but we found that the MDS gets hammered.
The other big application for scanning the filesystem is "indexing" (which we are
always trying to improve). We also use e2scan for this by dumping a sqlite DB
and then only stat'ing the new/modified files. Finally we update a mysql DB which
users can quickly query through a GUI. It is always an incremental scan update to
avoid stat'ing unchanged files. We all eagerly await changelogs.....
Thanks for the insight,
Daire
More information about the lustre-discuss
mailing list