[Lustre-discuss] Large directory performance

Hedges, Richard M. hedges1 at llnl.gov
Fri Sep 10 11:16:53 PDT 2010


It will continue downward as the number of files in the directory increase.
Interestingly, GPFS stat performance increased as the number of files
increased.  My tests were on 128 nodes * 8 processes/node * 10 - 500 files
per process.

- Richard


On 9/10/10 11:11 AM, "Michael Robbert" <mrobbert at mines.edu> wrote:

> We have been struggling with our Lustre performance for some time now
> especially with large directories. I recently did some informal benchmarking
> (on a live system so I know results are not scientifically valid) and noticed
> a huge drop in performance of reads(stat operations) past 20k files in a
> single directory. I'm using bonnie++, disabling IO testing (-s 0) and just
> creating, reading, and deleting 40kb files in a single directory. I've done
> this on for directory sizes of 2,000 to 40,000 files. Create performance is a
> flat line of ~150 files/sec across the board. Delete performance is all over
> the place, but no higher than 3,000 files/sec. The really interesting data
> point is read performance, which for these tests is just a stat of the file
> not reading data. Starting with the smaller directories it is relatively
> consistent at just below 2,500 files/sec, but when I jump from 20,000 files to
> 30,000 files the performance drops to around 100 files/sec. We were assuming
> this w
>  as somewhat expected behavior and are in the process of trying to get our
> users to change their code. Then yesterday I was browsing the Lustre
> Operations Manual and found section 33.8 that says Lustre is tested with
> directories as large as 10 million files in a single directory and still get
> lookups at a rate of 5,000 files/sec. That leaves me wondering 2 things. How
> can we get 5,000 files/sec for anything and why is our performance dropping
> off so suddenly at after 20k files?
> 
> Here is our setup:
> All IO servers are Dell PowerEdge 2950s. 2 8-core sockets with X5355  @
> 2.66GHz and 16Gb of RAM.
> The data is on DDN S2A 9550s with 8+2 RAID configuration connected directly
> with 4Gb Fibre channel.
> They are running RHEL 4.5, Lustre 6.7.2-ddn3, kernel
> 2.6.18-128.7.1.el5.ddn1.l1.6.7.2.ddn3smp
> 
> As a side note the users code is Parflow, developed at LLNL. The files are
> SILO files. We have as many as 1.4 million files in a single directory and we
> now have half a billion files that we need to deal with in one way or another.
> The code has already been modified to split the files on newer runs until
> multiple subdirectories, but we're still dealing with 10s of thousands of
> files in a single directory. The users have been able to run these data sets
> on Lustre systems at LLNL 3 orders of magnitude faster.
> 
> Thanks,
> Mike Robbert
> HPC & Networking Engineer
> Colorado School of Mines
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://*lists.lustre.org/mailman/listinfo/lustre-discuss
> 


====================================================

Richard Hedges
Customer Support and Test - File Systems Project
Development Environment Group - Livermore Computing
Lawrence Livermore National Laboratory
7000 East Avenue, MS L-557
Livermore, CA    94551

v:    (925) 423-2699
f:    (925) 423-6961
E:    richard-hedges at llnl.gov




More information about the lustre-discuss mailing list