[Lustre-discuss] Large directory performance

Fri Sep 10 11:11:20 PDT 2010

We have been struggling with our Lustre performance for some time now especially with large directories. I recently did some informal benchmarking (on a live system so I know results are not scientifically valid) and noticed a huge drop in performance of reads(stat operations) past 20k files in a single directory. I'm using bonnie++, disabling IO testing (-s 0) and just creating, reading, and deleting 40kb files in a single directory. I've done this on for directory sizes of 2,000 to 40,000 files. Create performance is a flat line of ~150 files/sec across the board. Delete performance is all over the place, but no higher than 3,000 files/sec. The really interesting data point is read performance, which for these tests is just a stat of the file not reading data. Starting with the smaller directories it is relatively consistent at just below 2,500 files/sec, but when I jump from 20,000 files to 30,000 files the performance drops to around 100 files/sec. We were assuming this was somewhat expected behavior and are in the process of trying to get our users to change their code. Then yesterday I was browsing the Lustre Operations Manual and found section 33.8 that says Lustre is tested with directories as large as 10 million files in a single directory and still get lookups at a rate of 5,000 files/sec. That leaves me wondering 2 things. How can we get 5,000 files/sec for anything and why is our performance dropping off so suddenly at after 20k files?

Here is our setup:
All IO servers are Dell PowerEdge 2950s. 2 8-core sockets with X5355  @ 2.66GHz and 16Gb of RAM.
The data is on DDN S2A 9550s with 8+2 RAID configuration connected directly with 4Gb Fibre channel.
They are running RHEL 4.5, Lustre 6.7.2-ddn3, kernel 2.6.18-128.7.1.el5.ddn1.l1.6.7.2.ddn3smp

As a side note the users code is Parflow, developed at LLNL. The files are SILO files. We have as many as 1.4 million files in a single directory and we now have half a billion files that we need to deal with in one way or another. The code has already been modified to split the files on newer runs until multiple subdirectories, but we're still dealing with 10s of thousands of files in a single directory. The users have been able to run these data sets on Lustre systems at LLNL 3 orders of magnitude faster.

Thanks,
Mike Robbert
HPC & Networking Engineer
Colorado School of Mines