Hi Bobbie,<br><br>Small file performance is an issue. <br>It is the caching that balances it out. Due to the nature of the work, all nodes in a given pool will always ask for the same set of files. So the initial response to requests may be slow, but the subsequent ones are fine. <br>


<br>As I had mentioned earlier, we also had problems with listing large directories. We worked around it by having a cron job on the Samba Gateway get the file stat in large directories at regular intervals, thereby keeping the OSS vfs cache primed at all times.<br>


<br>Play around with these parameters on MDS, OSS and Gateway ...it works out differently for everyone -<br>--------------------------------------------------------------------------------------------------------------------------------------------------------------<br>


sysctl -w vm.vfs_cache_pressure=2<br>sysctl -w vm.dirty_ratio=15<br>sysctl -w vm.swappiness=90                 #Swapping out regularly makes more space for caches<br>sysctl -w vm.dirty_background_ratio=4<br>--------------------------------------------------------------------------------------------------------------------------------------------------------------<br>


<br>On the Gateways / Clients, run after each time you mount Lustre -<br>--------------------------------------------------------------------------------------------------------------------------------------------------------------<br>


pushd /proc/fs/lustre/osc<br>for ost in *-OST*<br> do<br>  echo 32 > ${ost}/max_rpcs_in_flight<br> done<br>popd<br><br>lctl set_param osc.*.max_dirty_mb=512<br>--------------------------------------------------------------------------------------------------------------------------------------------------------------<br>


<br>--------------------------------------------------------------------------------------------------------------------------------------------------------------<br>/proc/fs/lustre/llite/<fsname>-<uid>/max_read_ahead_mb            # at default 40MB, as most of our files are in the 10MB range<br>


<br>/proc/fs/lustre/llite/<fsname>-<uid>/max_read_ahead_whole_mb  # set to 10MB<br>/proc/fs/lustre/llite/*/statahead_max                                            # set to 8192<br>--------------------------------------------------------------------------------------------------------------------------------------------------------------<br>


<br>Regards,<br><br><br>Indivar Nair<br><br><br><br><br><br><br><br><div class="gmail_quote">On Tue, Jan 22, 2013 at 12:55 AM, Lind, Bobbie J <span dir="ltr"><<a href="mailto:bobbie.j.lind@intel.com" target="_blank">bobbie.j.lind@intel.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Indivar,<br>

<br>

I would be very interested to see what tuning parameters you have set to<br>

tune lustre and the storage for small files.  I have had similar setups in<br>

the past and been stumped by the small file performance.<br>

<br>

--<br>

Bobbie Lind<br>

<br>

<br>

<br>

>Date: Mon, 21 Jan 2013 11:24:32 -0500<br>

>From: greg whynott <<a href="mailto:greg.whynott@gmail.com">greg.whynott@gmail.com</a>><br>

>Subject: Re: [Lustre-discuss] is Luster ready for prime time?<br>

>To: Indivar Nair <<a href="mailto:indivar.nair@techterra.in">indivar.nair@techterra.in</a>><br>

>Cc: "<a href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a>"<br>

>       <<a href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a>><br>

>Message-ID:<br>

>       <CAKuzA1G4-W122LQrf3VKqADd=<a href="mailto:WrDgcAVx5hyAGJfZwwR8KKG2g@mail.gmail.com">WrDgcAVx5hyAGJfZwwR8KKG2g@mail.gmail.com</a>><br>

>Content-Type: text/plain; charset="utf-8"<br>

<div class="im">><br>

>Thanks very much Indivar,  informative read.    it is good to see others<br>

>in<br>

>our sector are using the technology and you have some good points.<br>

><br>

>have a great day,<br>

>greg<br>

><br>

><br>

><br>

>On Sat, Jan 19, 2013 at 6:52 AM, Indivar Nair<br>

><<a href="mailto:indivar.nair@techterra.in">indivar.nair@techterra.in</a>>wrote:<br>

><br>

>>  Hi Greg,<br>

>><br>

>> One of our customers had a similar requirement and we deployed Lustre<br>

>> 2.0.0.1 for them. This was in July 2011. Though there were a lots of<br>

>> problems initially, all of them were sorted out over time. They are<br>

>>quite<br>

>> happy with it now.<br>

>><br>

</div>>> *Environment:*<br>

<div class="im">>> Its a 150 Artist studio with around 60 Render nodes. The studio mainly<br>

>> uses Mocha, After Effects, Silhouette, Synth Eye, Maya, and Nuke among<br>

>> others. They mainly work on 3D Effects and Stereoscopy Conversions.<br>

>> Around 45% of Artists and Render Nodes are on Linux and use native<br>

>>Lustre<br>

>> Client. All others access it through Samba.<br>

>><br>

</div>>> *Lustre Setup:*<br>

<div class="im">>> It consists of 2 x Dell R610 as MDS Nodes, and 4 x Dell R710 as OSS<br>

>>Nodes.<br>

>> 2 x Dell MD3200 with 12x1TB SAS Nearline Disks are used for storage.<br>

>>Each<br>

>> Dell MD3200s are shared among 2 OSS nodes for H/A.<br>

>><br>

>> Since the original plan (which didn't happen) was to move to a 100%<br>

>>Linux<br>

>> environment, we didn't allocate separate Samba Gateways and use the OSS<br>

>> nodes with CTDB for it. Thankfully, we haven't had any issues with that<br>

>>yet.<br>

>><br>

</div>>> *Performance:*<br>

<div class="im">>> We get a good THROUGHPUT of 800 - 1000MB/s with Lustre Caching. The<br>

>>disks<br>

>> it self provide much lesser speeds. But that is fine, as caching is in<br>

>> effect most of the time.<br>

>><br>

</div>>> *Challenge:*<br>

<div class="im">>> The challenge for us was to tune the storage for small files 10 - 50MB<br>

>> totalling to 10s of GBs. An average shot would consist of 2000 - 4000<br>

>>.dpx<br>

>> images. Some Scenes / Shots also had millions of <1MB Maya Cache files.<br>

>> This did tax the storage, especially the MDS. Fixed it to an extent by<br>

>> adding more RAM to MDS.<br>

>><br>

</div>>> *Suggestions:*<br>

<div><div class="h5">>><br>

>> 1. Get the real number of small files (I mean <1MB ones) created / used<br>

>>by<br>

>> all software. These are the ones that could give you the most trouble.<br>

>>Do<br>

>> not assume anything.<br>

>><br>

>> 2. Get the file - sizes, numbers and access patterns absolutely correct.<br>

>> This is the key.<br>

>>     Its easier to design and tune Lustre for large files and I/O.<br>

>><br>

>> 3. Network tuning is as important and storage tuning. Tune Switches,<br>

>>each<br>

>> Workstation, Render Nodes, Samba / NFS Gateways, OSS Nodes, MDS Nodes,<br>

>> everything.<br>

>><br>

>> 4. Similarly do not undermine Samba / NFS Gateway. Size and tune them<br>

>> correctly too.<br>

>><br>

>> 5. Use High Speed Switching like QDR Infiniband or 40GigE, especially<br>

>>for<br>

>> backend connectivity between Samba/NFS Gateway and Lustre MDS/OSS Nodes.<br>

>><br>

>> 6. As far as possible, have fixed directory pattern for all projects.<br>

>> Separate working files (Maya, Nuke, etc.) from the data, i.e. frames /<br>

>> images, videos, etc. at the top directory level it self. This will help<br>

>>you<br>

>> tune / manage the storage better. Different directory tree for different<br>

>> file sizes or file access types.<br>

>><br>

>> If designed and tuned right, I think Lustre is best storage currently<br>

>> available for your kind of work.<br>

>><br>

>> Hope this helps.<br>

>><br>

>> Regards,<br>

>><br>

>><br>

>> Indivar Nair<br>

>><br>

>><br>

>> On Fri, Jan 18, 2013 at 1:51 AM, greg whynott<br>

>><<a href="mailto:greg.whynott@gmail.com">greg.whynott@gmail.com</a>>wrote:<br>

>><br>

>>> Hi Charles,<br>

>>><br>

>>>   I received a few off list challenging email messages along with a few<br>

>>> fishing ones,  but its all good.   its interesting how a post asking a<br>

>>> question can make someone appear angry.  8)<br>

>>><br>

>>> Our IO profiles from the different segments of our business do vary<br>

>>> greatly.   The HPC is more or less the typical load you would expect to<br>

>>> see,  depending on which software is in use for the for the job being<br>

>>>ran.<br>

>>>       We have hundreds of artists and administrative staff who use the<br>

>>>file<br>

>>> system in a variety of ways.   Some examples would include but not<br>

>>>limited<br>

>>> to:  saving out multiple revisions of photoshop documents (typically<br>

>>>in the<br>

>>> hundreds of megs to +1gig range),   video editing (stereoscopic 2k and<br>

>>>4k<br>

>>> images(again from 10's 100's to gigs in size) including uncompressed<br>

>>> video,  excel, word and similar files,  thousands of project files<br>

>>>(from<br>

>>> software such as Maya,  Nuke and similar)  these also vary largely in<br>

>>>size,<br>

>>> from 1 to thousands of megs in size.<br>

>>><br>

>>> The intention is keep our data bases and VM requirements on the<br>

>>>existing<br>

>>> file system which is comprised of about 100 10k SAS drives,  it works<br>

>>>well.<br>

>>><br>

>>> We did consider GPFS but that consideration went out the door once I<br>

>>> started talking to them and hammering in some numbers into their online<br>

>>> calculator.  Things got a bit crazy quickly.   They have different<br>

>>>pricing<br>

>>> for the different types and speeds of Intel CPUs.  I got the feeling<br>

>>>they<br>

>>> were trying to squeeze every penny out of customers they could.  felt<br>

>>>very<br>

>>> Brocade-ish and left a bad taste with us.   wouldn't of been much of a<br>

>>> problem as some other shops I've worked at,  but here we do have a<br>

>>>finite<br>

>>> budget to work within.<br>

>>><br>

>>> The NAS vendors could all be considered scale out I suspect.   All 3<br>

>>>can<br>

>>> scale out the storage and front end.  NA C-mode can have up to 24<br>

>>>heads,<br>

>>> Blue Arc goes up to 4 or 8 depending on the class,  Isilon can go up<br>

>>>to 24<br>

>>> nodes or more as well if memory serves me correctly,  and they all<br>

>>>have a<br>

>>> single name space solution in place.   They each have their limits,<br>

>>>but<br>

>>> for our use case they are really subjective.   We will not hit the<br>

>>>limits<br>

>>> of their scalability before we are considering a fork lift refresh.<br>

>>>In our<br>

>>> view,  for what they offer it is perty much a wash for them - any would<br>

>>> meet our needs.  NetApp still has a silly agg/vol size limit,  at<br>

>>>least it<br>

>>> is up to 90TB now (from 9 in the past(formatted fs use))..  in April<br>

>>>it is<br>

>>> suppose to go much higher.<br>

>>><br>

>>>  The block storage idea in the mix - since all our HPC is linux,  they<br>

>>> all would become luster clients.   To provide a gateway into the luster<br>

>>> storage for none linux/luster hosts the thinking was a clustered pair<br>

>>>of<br>

>>> linux boxes running SAMBA/NFS which were also Luster clients.    Its<br>

>>>just<br>

>>> an idea being bounced around at this point.  The data serving<br>

>>>requirements<br>

>>> of the non HPC parts of the business are much less.   The video editors<br>

>>> most likely would stay on our existing storage solution as that is<br>

>>>working<br>

>>> out very well for them, but even if we did put them onto the Luster<br>

>>>FS,  I<br>

>>> think they would be fine.  based on that, it didn't seem so crazy to<br>

>>> consider block access in this method.   that said,  I think we would<br>

>>>be one<br>

>>> of the first in M&E to do so,  pioneers if you will...<br>

>>><br>

>>><br>

>>> diversify - we will end up in the same boat for the same reasons.<br>

>>><br>

>>><br>

>>> thanks Charles,<br>

>>> greg<br>

>>><br>

>>><br>

>>><br>

>>><br>

>>><br>

>>><br>

>>> On Thu, Jan 17, 2013 at 2:20 PM, Hammitt, Charles Allen <<br>

>>> <a href="mailto:chammitt@email.unc.edu">chammitt@email.unc.edu</a>> wrote:<br>

>>><br>

</div></div>>>>>  ** **<br>

>>>><br>

>>>> Somewhat surprised that no one has responded yet; although it?s likely<br>

>>>> that the responses would be rather subjective?including mine, of<br>

>>>>course!<br>

>>>> ****<br>

>>>><br>

>>>> ** **<br>

<div class="im">>>>><br>

>>>> Generally I would say that it would be interesting to know more about<br>

>>>> your datasets and intended workload; however, you mention this is to<br>

>>>>be<br>

</div>>>>> used as your day-to-day main business storage?so I imagine those<br>

>>>> characteristics would greatly vary? mine certainly do; that much is<br>

>>>>for<br>

>>>> sure!****<br>

>>>><br>

>>>> ** **<br>

>>>><br>

>>>> I don?t really think uptime would be as much an issue here; there are<br>

<div class="im">>>>> lots of redundancies, recovery mechanisms, and plenty of stable<br>

>>>>branches to<br>

</div>>>>> choose from?the question becomes what are the feature-set needs,<br>

<div class="im">>>>> performance usability for different file types and workloads, and<br>

>>>>general<br>

>>>> comfort level with greater complexity and somewhat less resources.<br>

>>>>That<br>

</div>>>>> said, I?d personally be a bit wary of using it as a general<br>

>>>>filesystem for<br>

>>>> *all* your needs.  ****<br>

>>>><br>

>>>> ** **<br>

>>>><br>

>>>> ** **<br>

<div class="im">>>>><br>

>>>> I do find it interesting that your short list is a wide range mix of<br>

>>>> storage and filesystem types; traditional NAS, scale-out NAS, and<br>

>>>>then some<br>

>>>> block storage with a parallel filesytem in Lustre.  Why no GPFS on<br>

>>>>the list<br>

</div>>>>> for comparison?****<br>

>>>><br>

>>>> ** **<br>

>>>><br>

>>>> I currently manage, or have used in the past *[bluearc]*, all the<br>

<div class="im">>>>> storage / filesystems and more from your list.  The reason being is<br>

>>>>that<br>

>>>> different storage and filesystems components have some things they<br>

>>>>are good<br>

</div>>>>> at? while other things they might not be as good at doing.  So I<br>

<div class="im">>>>>diversify<br>

>>>> by putting different storage/filesystem component pieces in the areas<br>

>>>>where<br>

</div>>>>> they excel at best?****<br>

>>>><br>

>>>> ** **<br>

>>>><br>

>>>> ** **<br>

>>>><br>

>>>> ** **<br>

>>>><br>

>>>> Regards,****<br>

>>>><br>

>>>> ** **<br>

>>>><br>

>>>> Charles****<br>

>>>><br>

>>>> ** **<br>

>>>><br>

>>>> ** **<br>

>>>><br>

>>>> ** **<br>

>>>><br>

>>>> *From:* <a href="mailto:lustre-discuss-bounces@lists.lustre.org">lustre-discuss-bounces@lists.lustre.org</a> [mailto:<br>

>>>> <a href="mailto:lustre-discuss-bounces@lists.lustre.org">lustre-discuss-bounces@lists.lustre.org</a>] *On Behalf Of *greg whynott<br>

>>>> *Sent:* Thursday, January 17, 2013 12:18 PM<br>

>>>> *To:* <a href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a><br>

>>>><br>

>>>> *Subject:* [Lustre-discuss] is Luster ready for prime time?****<br>

>>>><br>

>>>>  ** **<br>

<div class="im">>>>><br>

>>>> Hello,<br>

>>>><br>

>>>><br>

>>>> just signed up today, please forgive me if this question has been<br>

>>>> covered recently.  - in a bit of a rush to get an answer on this as<br>

>>>>we need<br>

>>>> to make a decision soon,  the idea of using luster was thrown into<br>

>>>>the mix<br>

>>>> very late in the decision making process.<br>

>>>><br>

</div>>>>> ****<br>

<div class="im">>>>><br>

>>>>  We are looking to procure a new storage solution which will<br>

>>>> predominately be used for HPC output but will also be used as our main<br>

>>>> business centric storage for day to day use.  Meaning the file system<br>

>>>>needs<br>

>>>> to be available 24/7/365.    The last time I was involved in<br>

>>>>considering<br>

>>>> Luster was about 6 years ago and it was at that time being considered<br>

>>>>for<br>

</div>>>>> scratch space for HPC usage only. ****<br>

<div class="im">>>>><br>

>>>> Our VMs and databases would remain on non-luster storage as we already<br>

>>>> have that in place and it works well.    The luster file system<br>

>>>>potentially<br>

>>>> would have everything else.  Projects we work on typically take up to<br>

>>>>2<br>

>>>> years to complete and during that time we would want all assets to<br>

>>>>remain<br>

</div>>>>> on the file system.****<br>

<div class="im">>>>><br>

>>>> Some of the vendors on our short list include HDS(Blue Arc), Isilon<br>

>>>>and<br>

>>>> NetApp.    Last week we started bouncing the idea of using Luster<br>

>>>>around.<br>

>>>> I'd love to use it if it is considered stable enough to do so.<br>

>>>><br>

>>>> your thoughts and/or comments would be greatly appreciated.  thanks<br>

>>>>for<br>

>>>> your time.<br>

>>>><br>

>>>> greg<br>

>>>><br>

>>>><br>

</div>>>>> ****<br>

<div class="im">>>>><br>

>>><br>

>>><br>

>>> _______________________________________________<br>

>>> Lustre-discuss mailing list<br>

>>> <a href="mailto:Lustre-discuss@lists.lustre.org">Lustre-discuss@lists.lustre.org</a><br>

>>> <a href="http://lists.lustre.org/mailman/listinfo/lustre-discuss" target="_blank">http://lists.lustre.org/mailman/listinfo/lustre-discuss</a><br>

>>><br>

>>><br>

>><br>

</div>>-------------- next part --------------<br>

>An HTML attachment was scrubbed...<br>

>URL:<br>

><a href="http://lists.lustre.org/pipermail/lustre-discuss/attachments/20130121/d311" target="_blank">http://lists.lustre.org/pipermail/lustre-discuss/attachments/20130121/d311</a><br>

>779c/attachment-0001.html<br>

><br>

>------------------------------<br>

<div class="im">><br>

>_______________________________________________<br>

>Lustre-discuss mailing list<br>

><a href="mailto:Lustre-discuss@lists.lustre.org">Lustre-discuss@lists.lustre.org</a><br>

><a href="http://lists.lustre.org/mailman/listinfo/lustre-discuss" target="_blank">http://lists.lustre.org/mailman/listinfo/lustre-discuss</a><br>

><br>

><br>

</div>>End of Lustre-discuss Digest, Vol 84, Issue 12<br>

>**********************************************<br>

<div class="HOEnZb"><div class="h5"><br>

_______________________________________________<br>

Lustre-discuss mailing list<br>

<a href="mailto:Lustre-discuss@lists.lustre.org">Lustre-discuss@lists.lustre.org</a><br>

<a href="http://lists.lustre.org/mailman/listinfo/lustre-discuss" target="_blank">http://lists.lustre.org/mailman/listinfo/lustre-discuss</a><br>

</div></div></blockquote></div><br>