[Lustre-discuss] is Luster ready for prime time?

Lind, Bobbie J bobbie.j.lind at intel.com
Mon Jan 21 11:25:17 PST 2013


I would be very interested to see what tuning parameters you have set to
tune lustre and the storage for small files.  I have had similar setups in
the past and been stumped by the small file performance.

Bobbie Lind

>Date: Mon, 21 Jan 2013 11:24:32 -0500
>From: greg whynott <greg.whynott at gmail.com>
>Subject: Re: [Lustre-discuss] is Luster ready for prime time?
>To: Indivar Nair <indivar.nair at techterra.in>
>Cc: "lustre-discuss at lists.lustre.org"
>	<lustre-discuss at lists.lustre.org>
>	<CAKuzA1G4-W122LQrf3VKqADd=WrDgcAVx5hyAGJfZwwR8KKG2g at mail.gmail.com>
>Content-Type: text/plain; charset="utf-8"
>Thanks very much Indivar,  informative read.    it is good to see others
>our sector are using the technology and you have some good points.
>have a great day,
>On Sat, Jan 19, 2013 at 6:52 AM, Indivar Nair
><indivar.nair at techterra.in>wrote:
>>  Hi Greg,
>> One of our customers had a similar requirement and we deployed Lustre
>> for them. This was in July 2011. Though there were a lots of
>> problems initially, all of them were sorted out over time. They are
>> happy with it now.
>> *Environment:*
>> Its a 150 Artist studio with around 60 Render nodes. The studio mainly
>> uses Mocha, After Effects, Silhouette, Synth Eye, Maya, and Nuke among
>> others. They mainly work on 3D Effects and Stereoscopy Conversions.
>> Around 45% of Artists and Render Nodes are on Linux and use native
>> Client. All others access it through Samba.
>> *Lustre Setup:*
>> It consists of 2 x Dell R610 as MDS Nodes, and 4 x Dell R710 as OSS
>> 2 x Dell MD3200 with 12x1TB SAS Nearline Disks are used for storage.
>> Dell MD3200s are shared among 2 OSS nodes for H/A.
>> Since the original plan (which didn't happen) was to move to a 100%
>> environment, we didn't allocate separate Samba Gateways and use the OSS
>> nodes with CTDB for it. Thankfully, we haven't had any issues with that
>> *Performance:*
>> We get a good THROUGHPUT of 800 - 1000MB/s with Lustre Caching. The
>> it self provide much lesser speeds. But that is fine, as caching is in
>> effect most of the time.
>> *Challenge:*
>> The challenge for us was to tune the storage for small files 10 - 50MB
>> totalling to 10s of GBs. An average shot would consist of 2000 - 4000
>> images. Some Scenes / Shots also had millions of <1MB Maya Cache files.
>> This did tax the storage, especially the MDS. Fixed it to an extent by
>> adding more RAM to MDS.
>> *Suggestions:*
>> 1. Get the real number of small files (I mean <1MB ones) created / used
>> all software. These are the ones that could give you the most trouble.
>> not assume anything.
>> 2. Get the file - sizes, numbers and access patterns absolutely correct.
>> This is the key.
>>     Its easier to design and tune Lustre for large files and I/O.
>> 3. Network tuning is as important and storage tuning. Tune Switches,
>> Workstation, Render Nodes, Samba / NFS Gateways, OSS Nodes, MDS Nodes,
>> everything.
>> 4. Similarly do not undermine Samba / NFS Gateway. Size and tune them
>> correctly too.
>> 5. Use High Speed Switching like QDR Infiniband or 40GigE, especially
>> backend connectivity between Samba/NFS Gateway and Lustre MDS/OSS Nodes.
>> 6. As far as possible, have fixed directory pattern for all projects.
>> Separate working files (Maya, Nuke, etc.) from the data, i.e. frames /
>> images, videos, etc. at the top directory level it self. This will help
>> tune / manage the storage better. Different directory tree for different
>> file sizes or file access types.
>> If designed and tuned right, I think Lustre is best storage currently
>> available for your kind of work.
>> Hope this helps.
>> Regards,
>> Indivar Nair
>> On Fri, Jan 18, 2013 at 1:51 AM, greg whynott
>><greg.whynott at gmail.com>wrote:
>>> Hi Charles,
>>>   I received a few off list challenging email messages along with a few
>>> fishing ones,  but its all good.   its interesting how a post asking a
>>> question can make someone appear angry.  8)
>>> Our IO profiles from the different segments of our business do vary
>>> greatly.   The HPC is more or less the typical load you would expect to
>>> see,  depending on which software is in use for the for the job being
>>>       We have hundreds of artists and administrative staff who use the
>>> system in a variety of ways.   Some examples would include but not
>>> to:  saving out multiple revisions of photoshop documents (typically
>>>in the
>>> hundreds of megs to +1gig range),   video editing (stereoscopic 2k and
>>> images(again from 10's 100's to gigs in size) including uncompressed
>>> video,  excel, word and similar files,  thousands of project files
>>> software such as Maya,  Nuke and similar)  these also vary largely in
>>> from 1 to thousands of megs in size.
>>> The intention is keep our data bases and VM requirements on the
>>> file system which is comprised of about 100 10k SAS drives,  it works
>>> We did consider GPFS but that consideration went out the door once I
>>> started talking to them and hammering in some numbers into their online
>>> calculator.  Things got a bit crazy quickly.   They have different
>>> for the different types and speeds of Intel CPUs.  I got the feeling
>>> were trying to squeeze every penny out of customers they could.  felt
>>> Brocade-ish and left a bad taste with us.   wouldn't of been much of a
>>> problem as some other shops I've worked at,  but here we do have a
>>> budget to work within.
>>> The NAS vendors could all be considered scale out I suspect.   All 3
>>> scale out the storage and front end.  NA C-mode can have up to 24
>>> Blue Arc goes up to 4 or 8 depending on the class,  Isilon can go up
>>>to 24
>>> nodes or more as well if memory serves me correctly,  and they all
>>>have a
>>> single name space solution in place.   They each have their limits,
>>> for our use case they are really subjective.   We will not hit the
>>> of their scalability before we are considering a fork lift refresh.
>>>In our
>>> view,  for what they offer it is perty much a wash for them - any would
>>> meet our needs.  NetApp still has a silly agg/vol size limit,  at
>>>least it
>>> is up to 90TB now (from 9 in the past(formatted fs use))..  in April
>>>it is
>>> suppose to go much higher.
>>>  The block storage idea in the mix - since all our HPC is linux,  they
>>> all would become luster clients.   To provide a gateway into the luster
>>> storage for none linux/luster hosts the thinking was a clustered pair
>>> linux boxes running SAMBA/NFS which were also Luster clients.    Its
>>> an idea being bounced around at this point.  The data serving
>>> of the non HPC parts of the business are much less.   The video editors
>>> most likely would stay on our existing storage solution as that is
>>> out very well for them, but even if we did put them onto the Luster
>>>FS,  I
>>> think they would be fine.  based on that, it didn't seem so crazy to
>>> consider block access in this method.   that said,  I think we would
>>>be one
>>> of the first in M&E to do so,  pioneers if you will...
>>> diversify - we will end up in the same boat for the same reasons.
>>> thanks Charles,
>>> greg
>>> On Thu, Jan 17, 2013 at 2:20 PM, Hammitt, Charles Allen <
>>> chammitt at email.unc.edu> wrote:
>>>>  ** **
>>>> Somewhat surprised that no one has responded yet; although it?s likely
>>>> that the responses would be rather subjective?including mine, of
>>>> ****
>>>> ** **
>>>> Generally I would say that it would be interesting to know more about
>>>> your datasets and intended workload; however, you mention this is to
>>>> used as your day-to-day main business storage?so I imagine those
>>>> characteristics would greatly vary? mine certainly do; that much is
>>>> sure!****
>>>> ** **
>>>> I don?t really think uptime would be as much an issue here; there are
>>>> lots of redundancies, recovery mechanisms, and plenty of stable
>>>>branches to
>>>> choose from?the question becomes what are the feature-set needs,
>>>> performance usability for different file types and workloads, and
>>>> comfort level with greater complexity and somewhat less resources.
>>>> said, I?d personally be a bit wary of using it as a general
>>>>filesystem for
>>>> *all* your needs.  ****
>>>> ** **
>>>> ** **
>>>> I do find it interesting that your short list is a wide range mix of
>>>> storage and filesystem types; traditional NAS, scale-out NAS, and
>>>>then some
>>>> block storage with a parallel filesytem in Lustre.  Why no GPFS on
>>>>the list
>>>> for comparison?****
>>>> ** **
>>>> I currently manage, or have used in the past *[bluearc]*, all the
>>>> storage / filesystems and more from your list.  The reason being is
>>>> different storage and filesystems components have some things they
>>>>are good
>>>> at? while other things they might not be as good at doing.  So I
>>>> by putting different storage/filesystem component pieces in the areas
>>>> they excel at best?****
>>>> ** **
>>>> ** **
>>>> ** **
>>>> Regards,****
>>>> ** **
>>>> Charles****
>>>> ** **
>>>> ** **
>>>> ** **
>>>> *From:* lustre-discuss-bounces at lists.lustre.org [mailto:
>>>> lustre-discuss-bounces at lists.lustre.org] *On Behalf Of *greg whynott
>>>> *Sent:* Thursday, January 17, 2013 12:18 PM
>>>> *To:* lustre-discuss at lists.lustre.org
>>>> *Subject:* [Lustre-discuss] is Luster ready for prime time?****
>>>>  ** **
>>>> Hello,
>>>> just signed up today, please forgive me if this question has been
>>>> covered recently.  - in a bit of a rush to get an answer on this as
>>>>we need
>>>> to make a decision soon,  the idea of using luster was thrown into
>>>>the mix
>>>> very late in the decision making process.
>>>> ****
>>>>  We are looking to procure a new storage solution which will
>>>> predominately be used for HPC output but will also be used as our main
>>>> business centric storage for day to day use.  Meaning the file system
>>>> to be available 24/7/365.    The last time I was involved in
>>>> Luster was about 6 years ago and it was at that time being considered
>>>> scratch space for HPC usage only. ****
>>>> Our VMs and databases would remain on non-luster storage as we already
>>>> have that in place and it works well.    The luster file system
>>>> would have everything else.  Projects we work on typically take up to
>>>> years to complete and during that time we would want all assets to
>>>> on the file system.****
>>>> Some of the vendors on our short list include HDS(Blue Arc), Isilon
>>>> NetApp.    Last week we started bouncing the idea of using Luster
>>>> I'd love to use it if it is considered stable enough to do so.
>>>> your thoughts and/or comments would be greatly appreciated.  thanks
>>>> your time.
>>>> greg
>>>> ****
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>-------------- next part --------------
>An HTML attachment was scrubbed...
>Lustre-discuss mailing list
>Lustre-discuss at lists.lustre.org
>End of Lustre-discuss Digest, Vol 84, Issue 12

More information about the lustre-discuss mailing list