[Lustre-discuss] Large directory performance

Michael Robbert mrobbert at mines.edu
Mon Sep 13 15:48:33 PDT 2010


On Sep 11, 2010, at 2:41 PM, Michael Robbert wrote:
> 
>> Mike, is there a chance you can try which rate acp reports?
>> 
>> http://oss.oracle.com/~mason/acp/
>> 
>> Also could you please send me your exact bonnie line or script? We could try 
>> to reproduce it on and idle test 9550 with a 6620 for metada (the 6620 is 
>> slower for that than the ef3010).
> 
> I have downloaded and compiled acp. I have started a copy of one of 1.6 million file directories. After 1 hour it is still reading files from a top level directory with only 122k files and hasn't written anything. The only option used on the command line was -v so I could see what it was doing. 
> 
> 
What exactly is it that we're trying to get out of acp? Yesterday one of my "tar pipe" copies finished earlier than expected. It happened while acp was running on another directory which I know should have nothing to do with the other, but then I started another copy yesterday and it finished by this morning (should have taken 2 days). At some point in this process I realized that the write portion of acp appears to not be implemented so all it does is read data. I am wondering if it is causing data to be cached, at a faster rate than tar can read, and therefore helping with the speed of my copying of data. On the other hand processes that I've started today appear to be going just as slow as before (maybe a little faster 300-500 files per minute). I'm also beginning to wonder how much of an impact the work of other users is affecting this. If that is the case I can bring some of it to a halt since some of it is the users with this large data as they are attempting to clean up their old data. I would like to know how I can monitor that. In the past I've seen the load average of the MDS to go up to 20 or 30. It is only at about 5 right now. How high does it have to go before overall performance is affected? or is that even an indicator I should be looking at? 
I'm trying to read as much Lustre documentation as I can, mostly the Lustre Operations Manual and old mailing list entries, but most of it is about OSS/OST performance and our problem seems to only be with the MDS/MDT. Any pointers to where I can learn more about what happens on the MDS. Especially anything about how it caches data.

Thanks,
Mike




More information about the lustre-discuss mailing list