[Lustre-devel] LustreFS performance (update)

Vitaly Fertman Vitaly.Fertman at Sun.COM
Fri Mar 20 06:15:56 PDT 2009


Hi Andrew,

thanks for you feedback,
indeed, this still looks more like a raw test list than a ready for  
publishing
document, but this is a continuous work and I am still working on it,  
so I will
try to address you suggestions.

On Mar 19, 2009, at 11:16 PM, Andrew C. Uselton wrote:

> Howdy Vitaly,
>  I like this.  It is quite comprehensive and detailed.  I'd like to  
> offer a few constructive criticisms in hope that you will better  
> achieve your goals.  Mostly I'll stick them in-line where they seem  
> relevant, but I'll start with:
> 1)  Your write up is quite dense and terse.  I could follow the  
> overall structure, but found it pretty tough going to understand any  
> specific detail.  It really helps to work with someone who will  
> write up the same information, but in a form with whole sentences  
> and a minimum of acronyms or special symbols.  Define the acronyms  
> you do use in a clear way in one place that I can refer back to.
>
>
> Vitaly Fertman wrote:
>> ****************************************************
>> 	LustreFS benchmarking methodology.
>> ****************************************************
>> The document aims to describe the benchmarking methodology which  
>> helps
>> to understand the LustreFS performance and reveal LustreFS  
>> bottlenecks  in
>> different configurations on different hardware, to ensure the next   
>> LustreFS
>> release does not downgrade comparing with a previous one. In other   
>> words:
>> 	Goal1. Understand the HEAD performance.
>> 	Goal2. Compare HEAD and b1_6 (b1_8) performance.
>> To achieve the Goal1, the methodology suggests to test different   
>> layers of
>> software in the bottom-top direction, i.e. the underlying back- 
>> end,  the target
>> server sitting on this back-end, the network connected to this  
>> target  and how
>> the target performs through this network, etc up to the whole  
>> cluster.
>
> I like this approach.  My own efforts tend to be at-scale testing at  
> the whole-cluster end of the range, often in the presence of other  
> cluster activity.  It is good to have the details of the underlying  
> components documented.
>
> ...
>> Obviously, it is not possible to perform all the thousands of tests  
>> in  all the configurations,
>> running all the special purpose tests, etc, the document tries to   
>> prepare:
>> 1) all the essential and sufficient tests to see how the system   
>> performs in general;
>> 2) some minimal amount of essential tests to see how the system  
>> scales  in different
>> conditions.
>
> In some cases it's obvious, but in many it is not clear what exactly  
> you mean to be testing.  It is a good extension to your methodology  
> to state clearly not only the mechanics of the test itself, but what  
> you think you are testing with the given experiment.  Spend a little  
> time and describe what the system is under examination, how it  
> responds or should respond to the proposed test, and what tunables  
> and parameters you think might be relevant.  For instance, if the  
> test is supposed to saturate the target server, then how much I/O do  
> you expect will be required and why?  What timeout or other tunable  
> may determine the observed saturation point.  Your goal should be to  
> have, not only a test, but a real expectation about its results even  
> before you run the test.  Once you have that expectation then you  
> can evaluate the results.  The bottom up approach helps with this,  
> since you can use the performance of the individual pieces to help  
> establish your expectation about the larger assemblies.
>
> ...
>> **** Hardware Requirements. ****
>> The test plan implies that we change only 1 parameter (cpu or disk  
>> or  network)
>> on each step. The HW requirements are:
>> -- at least 1 node with:
>>  CPU:32;
>>  RAM: enough to have a ramdisk for MDS;
>>  DISK: enough disks for raid6 or raid1+0 (as this node could be  
>> mds  or ost);
>> 	  an extra disk for external journal;
>>  NET: both GiGe and IB installed.
>> -- at least 1 another node includes:
>>  DISK: enough disks for raid6 or raid1+0 (as this node could be  
>> mds  or ost);
>> 	  an extra disk for external journal;
>> -- besides that: 8 clients, 3 other servers.
>> -- the other servers include:
>>  DISK: raid6
>>  NET: IB installed.
>> -- client includes:
>> NET: both GiGe and IB installed.
>> **** Software requirements ****
> You might provide links to these tests for those not familiar with  
> them.
>> 1. Short term.
>> 1.1 mdsrate
>> to be completed to test all the operations listed in MDST3 (see  
>> below).
>> 1.2 mdsrate-**.sh
>> to be fixed/written to run mdsrate properly and test all the   
>> operations listed in
>> MDST3 (see below).
>> 1.3. fake disk
>> implement FAIL flag and report 'done' without doing anything in   
>> obdfilter to get
>> a low-latency disk.
>> 1.4. MT.
>> add more tests here and implement them.
>> 2. Long term.
>> 2.1. mdtstack-survey
>> - an echo client-server is to be written for mds similar to ost.
>> - a test script similar to obdfilter-survey.sh is to be written.
>> **** Different configurations ****
> ...
>
> I'll cut it short here, but in general, I think you might be  
> surprised that if you organize this document so that anyone else  
> could come along behind you and perform all the same tests in the  
> same way, you might get a lot of others doing these experiments  
> along side you.  That would make your job a lot easier and increase  
> the likelihood that bugs and regressions would be caught quickly.
>
>> --
>> Vitaly
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>
> Cheers,
> Andrew

--
Vitaly




More information about the lustre-devel mailing list