[Lustre-discuss] e2scan for cleaning scratch space

Wed Mar 11 02:30:45 PDT 2009

Jim,

----- "Jim Garlick" <garlick at llnl.gov> wrote:

> E2scan seems like an evil kludge to me, at least if you don't quiesce
> your servers first, which is impractical for us to do.  It is especially 
> painful if you have to correlate data taken seperately from OST and MDT
> which I guess you need not do if a) you have a release with trustable MDS
> times (not 1.6.6), or b) you plan to stat(2) the MDS-generated list on a
> client before purging them.

For the purposes of clearing old data are m|ctimes of files on the MDT 
filesystem really going to be that far out? We came across something similar
when we used to rsync between two MDTs and filed a "bug" here:

  https://bugzilla.lustre.org/show_bug.cgi?id=14952

I thought that the MDT filesystem times would at least be close to the "real"
lustre filesystem time? Perhaps things are more complicated with striped files.

> See bug 16942 which describes an MDS resident "purge thread" that
> continually walks the file system implementing the policy.  This is how
> we thought purging ought to be optimized and we hope to have this in place
> by the time we put 1.8 in production.

Interesting. I will keep an eye on that - cheers.

> Meanwhile, we are walking the file system from a client.  Note that
> for this
> 
>   lfs find --type f --atime +60 --mtime +60 --ctime +60 /mnt/lustre >list
> 
> beats
> 
>   find /mnt/lustre -type f -atime +60 -mtime +60 -ctime +60 >list
> 
> by a wide margin since most of the time it does not have to contact
> the OST's, which stat(2) will always do for the foreseeable future 
> (until size-on-MDS) to get st_size.

I did not think about this before - thanks. So the "accurate" times are held 
in the EAs on the MDT? And the stat(2) is so slow because it wants file size 
too which then needs to talk to the OSTs. Is there still going to be an 
overhead on the MDS reading the EAs from disk compared to just stating the 
files on the MDT device?

Some comparison benchmarks of one of our filesystems (36 million files) with 
10GigE:

  # e2scan -l -D -N 0 /dev/lustre/mdt1
  ~49 minutes
  # e2scan -l -D -N `date --date="60 days ago" +%s` /dev/lustre/mdt1
  ~18 minutes

  # lfs find /mnt/lustre
  ~403 minutes
  # lfs find -atime +60 -mtime +60 -ctime +60 /mnt/lustre
  ~2520 minutes

  # find /mnt/lustre
  ~100 minutes
  # find /mnt/lustre -atime +60 -mtime +60 -ctime +60
  ~6574 minutes (4.5 days!)

The results are not 100% accurate because this was run on a production system
whose load varied throughout the day. I appreciate that "lfs find" is more 
convenient (and accurate) than using e2scan for particular ctimes and mtimes 
but there is still enough of a performance difference in our environment which 
makes e2scan preferable. Running multiple (lfs) find commands across a compute 
cluster does help speed things up but we found that the MDS gets hammered.

The other big application for scanning the filesystem is "indexing" (which we are 
always trying to improve). We also use e2scan for this by dumping a sqlite DB 
and then only stat'ing the new/modified files. Finally we update a mysql DB which 
users can quickly query through a GUI. It is always an incremental scan update to 
avoid stat'ing unchanged files. We all eagerly await changelogs.....

Thanks for the insight,

Daire