[Lustre-discuss] Lustre directory sizes - fast "du"

Thu Sep 4 10:55:21 PDT 2008

> Hi all, since our users have managed to write several TBs to
> Lustre by now, they sometimes would like to know what and how
> much there is in their directories. Is there any smarter way
> to find out than to do a "du -hs <dirname>" and wait for 30min
> for the 12TB-answer ?

If you have any patches that speed up the fetching of (what are
likely to be) millions of records from random places on a disk
very quickly, and also speed up the latency of the associated
network roundtrips please let us know :-).

Recent Lustre version try at least to amortize the roundtrip
cost by prefetching metadata, which may work.

Another way to help might be to put the MDTs on the lowest
latency disks you can find on a system with very very large
amounts of RAM.

It may be worth looking at recent flash SSDs for the MDTs, as
they have very very low latency.

> I've already told them to substitute "ls -l" by "find -type f
> -exec ls -l {};", although I'm not too sure about that either.

That's crazy. Even if you have directories with really large
number of files, which is crazy in itself.

Lustre scales *data* fairly well, *metadata* scales a lot less
well, and that's because scaling metadata is a huge research
problem.

I see a number of similarly crazy or much worse questions in the
XFS mailing list too. The craziest ones are from the vast number
of those who think that file systems are database managers, and
files can be used as records.

In some way thyings like Linux and Lustre or XFS make building
large scale storage projects too easy, bringing difficult issues
down in cost and complexity to the level of many who can't quite
realize the import of what they are doing.