[Lustre-discuss] Performance Expectations of Lustre

Mon Jan 26 11:36:02 PST 2009

Thank you very much for this feedback Balagopal, it's extremely useful. 
I will look into the MD1000 and revise my plan.
-Nick

Balagopal Pillai wrote:
> MD3000 series doesn't seem to have raid 6 support, which could be very 
> useful with lots of sata drives.
> Also MD3000i doesn't specify LACP support for the dual or quad Ethernet 
> ports on the enclosure. But
> a pe1950 + perc 6 with MD1000 has raid 6 support and the OSS can benefit 
> from good ethernet bonding support in Linux.
> I have a setup with eight MD1000s on two perc 5's on two OSS.
> 
> 
> Balagopal
> 
> Nick Jennings wrote:
>> Hi Brian! Thanks for the reply, comments below
>>
>> Brian J. Murrell wrote:
>>   
>>>>   Instead of just adding another 1TB server, I need to plan for a more 
>>>> scalable solution. Immediately Lustre came to mind, but I'm wondering 
>>>> about the performance. Basically our company does niche web-hosting for 
>>>> "Creative Professionals" so we need fast access to the data in order to 
>>>> have snappy web services for our clients. Typically these are smaller 
>>>> files (2MB pictures, 50MB videos, .swf files, etc.).
>>>>       
>>> Well, I'm not sure those files would fall within our general
>>> classification of "small files" (wherein we know we don't perform very
>>> well).  Our small-file issues are usually characterized by "kernel
>>> builds" and ~ use, where files are usually much smaller than 1MB.
>>>     
>>   Aha, OK well then that's good to know. There's also some kind of 
>> read-ahead and client side caching right? So files which are accessed a 
>> lot will be faster to access.
>>
>>
>>   
>>>>   Also I'm wondering about the best way set this up in terms of speed 
>>>> and ease of growth. I want the web-servers and the storage pool to be 
>>>> independent of each other. So I can add web-servers as the web traffic 
>>>> increases, and add more storage ass our storage needs grow.
>>>>       
>>> Well, your web-servers would be Lustre clients.  There is no
>>> relationship, or rather requirements in terms of the number of clients
>>> and servers being used.  You use as many servers as your client load
>>> demands.  So you could imagine both ends of the spectrum where only a
>>> relatively few clients could be used to tax quite a few servers or the
>>> opposite where a lot of clients with modest demand requires only a few
>>> servers.
>>>
>>>     
>>>>   I was thinking initially we could start with 2 servers, both attached 
>>>> to the storage array. setup as OSS' and functioning as (load balanced) 
>>>> web-servers as well.
>>>>       
>>> Sounds like you are describing 2 storage servers, which would require at
>>> least 3 servers total.  Don't forget about the MDS.  Also don't forget
>>> about HA if that's a concern for you.  You could make the 2 OSSes
>>> failover partners for each other if you are willing to accept a possibly
>>> lower performance impact when one of the OSSes failing.
>>>
>>> If HA is important to you however, you need to address an MDS failover
>>> with a second server to pick up the MDT should the active MDS fail.
>>>     
>> HA is definitely critical, if the storage pool becomes inaccessible we 
>> loose clients (and all fingers point at me!). However, I need to find a 
>> reasonable balance between cost / scalability / performance. The idea 
>> would be to start small, with the simplest configuration, but allow for 
>> a lot of growth. In a years time, if we are using 5TB of data, we will 
>> be in a very good position financially and can afford a systems expansion.
>>
>> So for starters, what can I get away with here? 1 OSS, 1MDS & 1 Client 
>> node? Is it a smart thing to do to have the MDS and OSS share the same 
>> storage target (just a separate partition for the MDS)? What kind of 
>> system specs are advisable for each type (MDS, OSS & Client node) as far 
>> as RAM, CPU, disk configuration etc? Also, is it possible to add more 
>> OSS' to take over existing OSTs that another OSS was previously 
>> managing? ie. if I have the MD3000i split into 5x1TB volumes (5xOSTs), 
>> and the OSS is getting hammered, I set another OSS up and hand off 2 or 
>> 3 OSTs from the old OSS to the new one, and set it up as failover for 
>> the remaining OSTs. Do-able?
>>
>>
>>
>>   
>>> As for OSSes being web-servers, that would require the OSS/Webservers
>>> also be clients and that is an unsupported configuration due to the risk
>>> of deadlock due to memory pressure.  The recommended architecture would
>>> be to make the webservers Lustre clients.
>>>     
>> I see, so from the get-go I'm going to need an internal gigE network for 
>> OSS/Client communication.
>>
>>
>>   
>>>> performance can I expect, am I out of touch to expect something similar 
>>>> to a directly attached RAID array?
>>>>       
>>> I think our generally talked about numbers are something on the order of
>>> achieving 80% of the raw storage bandwidth (assuming a capable network
>>> and so on).  Maybe somebody who is closer to the benchmarking that we
>>> are constantly doing can comment further on how close-to-raw-disk we are
>>> achieving lately.
>>>     
>> Is it safe to say my bottleneck is going to be the OSS & not the 
>> network? Is there some documentation I can read about typical setups, 
>> usage cases & methods for optimal performance?
>>
>> Thanks!
>> -Nick
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>   
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss