[Lustre-discuss] e2scan for cleaning scratch space

LEIBOVICI Thomas thomas.leibovici at cea.fr
Thu Mar 19 00:41:02 PDT 2009


Hi,

>
> Some comparison benchmarks of one of our filesystems (36 million 
> files) with
> 10GigE:
>
>   # e2scan -l -D -N 0 /dev/lustre/mdt1
>   ~49 minutes
>   # e2scan -l -D -N `date --date="60 days ago" +%s` /dev/lustre/mdt1
>   ~18 minutes
>
>   # lfs find /mnt/lustre
>   ~403 minutes
>   # lfs find -atime +60 -mtime +60 -ctime +60 /mnt/lustre
>   ~2520 minutes
>
>   # find /mnt/lustre
>   ~100 minutes
>   # find /mnt/lustre -atime +60 -mtime +60 -ctime +60
>   ~6574 minutes (4.5 days!)
>

I'm very interested in this discussion, for Lustre-HSM purpose.
The Lustre-HSM Policy Engine will mostly process ChangeLogs, but an 
initial scanning may be needed
for upgrading a non-empty Lustre file system to a Lustre-HSM system.
Looking at those results, e2scan seems a very efficient way to retrieve 
metadata for all entries,
so it could be used for providing an initial list to PolicyEngine, as a 
flat file or DB.
Does it provide common Posix attributes and striping information?
I also guess it does not provide file size until 'Size On MDS' feature 
will be landed.

> The other big application for scanning the filesystem is "indexing" 
> (which we are
> always trying to improve). We also use e2scan for this by dumping a 
> sqlite DB
> and then only stat'ing the new/modified files. Finally we update a 
> mysql DB which
> users can quickly query through a GUI. It is always an incremental 
> scan update to
> avoid stat'ing unchanged files. We all eagerly await changelogs.....
>
Don't you have performance issues with SQLite? It seamed to me that it 
was not very efficient for managing
huge sets of data with millions of entries.

Kind regards,

Thomas LEIBOVICI
CEA/DAM



More information about the lustre-discuss mailing list