[Lustre-discuss] Large directory performance

Mon Sep 13 09:45:50 PDT 2010

Michael Robbert wrote:
> We have been struggling with our Lustre performance for some time now
> especially with large directories. I recently did some informal
> benchmarking (on a live system so I know results are not
> scientifically valid) and noticed a huge drop in performance of
> reads(stat operations) past 20k files in a single directory. I'm
> using bonnie++, disabling IO testing (-s 0) and just creating,
> reading, and deleting 40kb files in a single directory. I've done
> this on for directory sizes of 2,000 to 40,000 files. Create
> performance is a flat line of ~150 files/sec across the board. Delete
> performance is all over the place, but no higher than 3,000
> files/sec. The really interesting data point is read performance,
> which for these tests is just a stat of the file not reading data.
> Starting with the smaller directories it is relatively consistent at
> just below 2,500 files/sec, but when I jump from 20,000 files to
> 30,000 files the performance drops to around 100 files/sec. We were

Think small random RAID6 reads.  Performance craters when you do this.

> assuming this w as somewhat expected behavior and are in the process
> of trying to get our users to change their code. Then yesterday I was
> browsing the Lustre Operations Manual and found section 33.8 that
> says Lustre is tested with directories as large as 10 million files
> in a single directory and still get lookups at a rate of 5,000
> files/sec. That leaves me wondering 2 things. How can we get 5,000
> files/sec for anything and why is our performance dropping off so
> suddenly at after 20k files?

Change your MDT to be on a different machine.  A very fast RAID10.

I've seen fast SAS 15k recommended, but they aren't the only options.

What you want are very high random read IOPs.

> Here is our setup: All IO servers are Dell PowerEdge 2950s. 2 8-core
> sockets with X5355  @ 2.66GHz and 16Gb of RAM. The data is on DDN S2A
> 9550s with 8+2 RAID configuration connected directly with 4Gb Fibre
> channel. They are running RHEL 4.5, Lustre 6.7.2-ddn3, kernel
> 2.6.18-128.7.1.el5.ddn1.l1.6.7.2.ddn3smp

Hmmm... thats a RHEL5 kernel, not a RHEL4 kernel.  Are you sure you have 
4.5?

> 
> As a side note the users code is Parflow, developed at LLNL. The
> files are SILO files. We have as many as 1.4 million files in a
> single directory and we now have half a billion files that we need to
> deal with in one way or another. The code has already been modified
> to split the files on newer runs until multiple subdirectories, but
> we're still dealing with 10s of thousands of files in a single
> directory. The users have been able to run these data sets on Lustre
> systems at LLNL 3 orders of magnitude faster.

This shouldn't be a problem for a well designed system.

Regards,

Joe

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615