[Lustre-discuss] is Luster ready for prime time?

Sat Jan 19 03:52:04 PST 2013

 Hi Greg,

One of our customers had a similar requirement and we deployed Lustre
2.0.0.1 for them. This was in July 2011. Though there were a lots of
problems initially, all of them were sorted out over time. They are quite
happy with it now.

*Environment:*
Its a 150 Artist studio with around 60 Render nodes. The studio mainly uses
Mocha, After Effects, Silhouette, Synth Eye, Maya, and Nuke among others.
They mainly work on 3D Effects and Stereoscopy Conversions.
Around 45% of Artists and Render Nodes are on Linux and use native Lustre
Client. All others access it through Samba.

*Lustre Setup:*
It consists of 2 x Dell R610 as MDS Nodes, and 4 x Dell R710 as OSS Nodes.
2 x Dell MD3200 with 12x1TB SAS Nearline Disks are used for storage. Each
Dell MD3200s are shared among 2 OSS nodes for H/A.

Since the original plan (which didn't happen) was to move to a 100% Linux
environment, we didn't allocate separate Samba Gateways and use the OSS
nodes with CTDB for it. Thankfully, we haven't had any issues with that yet.

*Performance:*
We get a good THROUGHPUT of 800 - 1000MB/s with Lustre Caching. The disks
it self provide much lesser speeds. But that is fine, as caching is in
effect most of the time.

*Challenge:*
The challenge for us was to tune the storage for small files 10 - 50MB
totalling to 10s of GBs. An average shot would consist of 2000 - 4000  .dpx
images. Some Scenes / Shots also had millions of <1MB Maya Cache files.
This did tax the storage, especially the MDS. Fixed it to an extent by
adding more RAM to MDS.

*Suggestions:*

1. Get the real number of small files (I mean <1MB ones) created / used by
all software. These are the ones that could give you the most trouble. Do
not assume anything.

2. Get the file - sizes, numbers and access patterns absolutely correct.
This is the key.
    Its easier to design and tune Lustre for large files and I/O.

3. Network tuning is as important and storage tuning. Tune Switches, each
Workstation, Render Nodes, Samba / NFS Gateways, OSS Nodes, MDS Nodes,
everything.

4. Similarly do not undermine Samba / NFS Gateway. Size and tune them
correctly too.

5. Use High Speed Switching like QDR Infiniband or 40GigE, especially for
backend connectivity between Samba/NFS Gateway and Lustre MDS/OSS Nodes.

6. As far as possible, have fixed directory pattern for all projects.
Separate working files (Maya, Nuke, etc.) from the data, i.e. frames /
images, videos, etc. at the top directory level it self. This will help you
tune / manage the storage better. Different directory tree for different
file sizes or file access types.

If designed and tuned right, I think Lustre is best storage currently
available for your kind of work.

Hope this helps.

Regards,

Indivar Nair

On Fri, Jan 18, 2013 at 1:51 AM, greg whynott <greg.whynott at gmail.com>wrote:

> Hi Charles,
>
>   I received a few off list challenging email messages along with a few
> fishing ones,  but its all good.   its interesting how a post asking a
> question can make someone appear angry.  8)
>
> Our IO profiles from the different segments of our business do vary
> greatly.   The HPC is more or less the typical load you would expect to
> see,  depending on which software is in use for the for the job being ran.
>       We have hundreds of artists and administrative staff who use the file
> system in a variety of ways.   Some examples would include but not limited
> to:  saving out multiple revisions of photoshop documents (typically in the
> hundreds of megs to +1gig range),   video editing (stereoscopic 2k and 4k
> images(again from 10's 100's to gigs in size) including uncompressed
> video,  excel, word and similar files,  thousands of project files (from
> software such as Maya,  Nuke and similar)  these also vary largely in size,
> from 1 to thousands of megs in size.
>
> The intention is keep our data bases and VM requirements on the existing
> file system which is comprised of about 100 10k SAS drives,  it works well.
>
> We did consider GPFS but that consideration went out the door once I
> started talking to them and hammering in some numbers into their online
> calculator.  Things got a bit crazy quickly.   They have different pricing
> for the different types and speeds of Intel CPUs.  I got the feeling they
> were trying to squeeze every penny out of customers they could.  felt very
> Brocade-ish and left a bad taste with us.   wouldn't of been much of a
> problem as some other shops I've worked at,  but here we do have a finite
> budget to work within.
>
> The NAS vendors could all be considered scale out I suspect.   All 3 can
> scale out the storage and front end.  NA C-mode can have up to 24 heads,
> Blue Arc goes up to 4 or 8 depending on the class,  Isilon can go up to 24
> nodes or more as well if memory serves me correctly,  and they all have a
> single name space solution in place.   They each have their limits,   but
> for our use case they are really subjective.   We will not hit the limits
> of their scalability before we are considering a fork lift refresh.  In our
> view,  for what they offer it is perty much a wash for them - any would
> meet our needs.  NetApp still has a silly agg/vol size limit,  at least it
> is up to 90TB now (from 9 in the past(formatted fs use))..  in April it is
> suppose to go much higher.
>
>  The block storage idea in the mix - since all our HPC is linux,  they all
> would become luster clients.   To provide a gateway into the luster storage
> for none linux/luster hosts the thinking was a clustered pair of linux
> boxes running SAMBA/NFS which were also Luster clients.    Its just an idea
> being bounced around at this point.  The data serving requirements of the
> non HPC parts of the business are much less.   The video editors most
> likely would stay on our existing storage solution as that is working out
> very well for them, but even if we did put them onto the Luster FS,  I
> think they would be fine.  based on that, it didn't seem so crazy to
> consider block access in this method.   that said,  I think we would be one
> of the first in M&E to do so,  pioneers if you will...
>
>
> diversify - we will end up in the same boat for the same reasons.
>
>
> thanks Charles,
> greg
>
>
>
>
>
>
> On Thu, Jan 17, 2013 at 2:20 PM, Hammitt, Charles Allen <
> chammitt at email.unc.edu> wrote:
>
>>  ** **
>>
>> Somewhat surprised that no one has responded yet; although it’s likely
>> that the responses would be rather subjective…including mine, of course!*
>> ***
>>
>> ** **
>>
>> Generally I would say that it would be interesting to know more about
>> your datasets and intended workload; however, you mention this is to be
>> used as your day-to-day main business storage…so I imagine those
>> characteristics would greatly vary… mine certainly do; that much is for
>> sure!****
>>
>> ** **
>>
>> I don’t really think uptime would be as much an issue here; there are
>> lots of redundancies, recovery mechanisms, and plenty of stable branches to
>> choose from…the question becomes what are the feature-set needs,
>> performance usability for different file types and workloads, and general
>> comfort level with greater complexity and somewhat less resources.  That
>> said, I’d personally be a bit wary of using it as a general filesystem for
>> *all* your needs.  ****
>>
>> ** **
>>
>> ** **
>>
>> I do find it interesting that your short list is a wide range mix of
>> storage and filesystem types; traditional NAS, scale-out NAS, and then some
>> block storage with a parallel filesytem in Lustre.  Why no GPFS on the list
>> for comparison?****
>>
>> ** **
>>
>> I currently manage, or have used in the past *[bluearc]*, all the
>> storage / filesystems and more from your list.  The reason being is that
>> different storage and filesystems components have some things they are good
>> at… while other things they might not be as good at doing.  So I diversify
>> by putting different storage/filesystem component pieces in the areas where
>> they excel at best…****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> Regards,****
>>
>> ** **
>>
>> Charles****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> *From:* lustre-discuss-bounces at lists.lustre.org [mailto:
>> lustre-discuss-bounces at lists.lustre.org] *On Behalf Of *greg whynott
>> *Sent:* Thursday, January 17, 2013 12:18 PM
>> *To:* lustre-discuss at lists.lustre.org
>>
>> *Subject:* [Lustre-discuss] is Luster ready for prime time?****
>>
>>  ** **
>>
>> Hello,
>>
>>
>> just signed up today, please forgive me if this question has been covered
>> recently.  - in a bit of a rush to get an answer on this as we need to make
>> a decision soon,  the idea of using luster was thrown into the mix very
>> late in the decision making process.
>>
>> ****
>>
>>  We are looking to procure a new storage solution which will
>> predominately be used for HPC output but will also be used as our main
>> business centric storage for day to day use.  Meaning the file system needs
>> to be available 24/7/365.    The last time I was involved in considering
>> Luster was about 6 years ago and it was at that time being considered for
>> scratch space for HPC usage only. ****
>>
>> Our VMs and databases would remain on non-luster storage as we already
>> have that in place and it works well.    The luster file system potentially
>> would have everything else.  Projects we work on typically take up to 2
>> years to complete and during that time we would want all assets to remain
>> on the file system.****
>>
>> Some of the vendors on our short list include HDS(Blue Arc), Isilon and
>> NetApp.    Last week we started bouncing the idea of using Luster around.
>> I'd love to use it if it is considered stable enough to do so.
>>
>> your thoughts and/or comments would be greatly appreciated.  thanks for
>> your time.
>>
>> greg
>>
>>
>> ****
>>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20130119/b9626b66/attachment.htm>