[Lustre-discuss] Large directory performance

Sat Sep 11 13:41:07 PDT 2010

On Sep 10, 2010, at 5:32 PM, Bernd Schubert wrote:

> On Saturday, September 11, 2010, Andreas Dilger wrote:
>> On 2010-09-10, at 12:11, Michael Robbert wrote:
>>> Create performance is a flat line of ~150 files/sec across the board.
>>> Delete performance is all over the place, but no higher than 3,000
>>> files/sec... Then yesterday I was browsing the Lustre Operations Manual
>>> and found section 33.8 that says Lustre is tested with directories as
>>> large as 10 million files in a single directory and still get lookups at
>>> a rate of 5,000 files/sec. That leaves me wondering 2 things. How can we
>>> get 5,000 files/sec for anything and why is our performance dropping off
>>> so suddenly at after 20k files?
>>> 
>>> Here is our setup:
>>> All IO servers are Dell PowerEdge 2950s. 2 8-core sockets with X5355  @
>>> 2.66GHz and 16Gb of RAM. The data is on DDN S2A 9550s with 8+2 RAID
>>> configuration connected directly with 4Gb Fibre channel.
>> 
>> Are you using the DDN 9550s for the MDT?  That would be a bad
>> configuration, because they can only be configured with RAID-6, and would
>> explain why you are seeing such bad performance.  For the MDT you always
> 
> Unfortunately, we failed to copy the scratch MDT in a reasonable time so far. 
> Copying several hundreds of million files turned out to take ages ;) But I 
> guess Mike did the benchmarks for the other filesystem with an EF3010.

The benchmarks listed above are for our scratch filesystem, whose MDT is on the 9550. I don't know why I didn't mention the benchmarks that I also ran on our home filesystem whose MDT was recently moved to the EF3010 with RAID 1+0 on 6 SAS disks. The other 6 disks in the EF3010 are waiting for when we can move the scratch MDT there. Anyways, the benchmarks on home were actually worse. Create performance was about the same, but read performance was in the low hundreds. The command line was:
./bonnie++ -d $dir -s 0 -n $size:40000:40000:1
Where $dir was a directory on the filesystem being tested and $size was the number of files in thousands (5, 10, 20, 30)

A dd of the MDT wasn't possible because the original LUN was nearly 5Tb (only 35Gb used), but the new LUN is just over 1Tb.

> 
>>> We have as many as 1.4 million files in a single directory and we now
>>> have half a billion files that we need to deal with in one way or
>>> another.
> 
> Mike, is there a chance you can try which rate acp reports?
> 
> http://oss.oracle.com/~mason/acp/
> 
> Also could you please send me your exact bonnie line or script? We could try 
> to reproduce it on and idle test 9550 with a 6620 for metada (the 6620 is 
> slower for that than the ef3010).

I have downloaded and compiled acp. I have started a copy of one of 1.6 million file directories. After 1 hour it is still reading files from a top level directory with only 122k files and hasn't written anything. The only option used on the command line was -v so I could see what it was doing. 

Thanks,
Mike