[Lustre-discuss] Lustre buffer cache causes large system overhead.

Tue Sep 3 03:57:23 PDT 2013

An admin at another site sent me this info (thanks Hans)

kernel component, BZ#770545
In Red Hat Enterprise Linux 6.2 and Red Hat Enterprise Linux 6.3, the 
default value for sysctl vm.zone_reclaim_mode is now 0, whereas in Red Hat 
Enterprise Linux 6.1 it was 1.

Just a heads up for anyone planning an upgrade...

r.

On Saturday, August 24, 2013 08:08:06 Dragseth Roy Einar wrote:
> The kernel docs for zone_reclaim_mode indicates that a value of 0 
makes
> sense on dedicated file servers like MDS/OSS as fetching cached data 
from
> another numa domain is much faster than going all the way to the disk.  
For
> clients that need the memory for computations a value of 1 seems to be 
the
> way to go as (I guess) it reduces the cross-domain traffic.
> 
> r.
> 
> On Friday 23. August 2013 13.59.44 Patrick Shopbell wrote:
> > Hi all -
> > I have watched this thread with much interest, and now I am even
> > more interested/confused.  :-)
> > 
> > Several months back, we had a very substantial slowdown on our
> > MDS box. Interactive use of the box was very sluggish, even
> > though the load was quite low. This was eventually solved by
> > setting the opposite value for the variable in question:
> > 
> > vm.zone_reclaim_mode = 0
> > 
> > And it was equally dramatic in its solution of our problem - the MDS
> > started responding normally immediately afterwards. We went ahead
> > and set the value to zero on all of our NUMA machines. (We are
> > running Lustre 2.3.)
> > 
> > Clearly, I need to do some reading on Lustre and its various caching
> > issues. This has been a quite interesting discussion.
> > 
> > Thanks everyone for such a great list.
> > --
> > Patrick
> > 
> > *--------------------------------------------------------------------*
> > 
> > | Patrick Shopbell               Department of Astronomy             |
> > | pls at astro.caltech.edu          Mail Code 249-17                    |
> > | (626) 395-4097                 California Institute of Technology  |
> > | (626) 568-9352  (FAX)          Pasadena, CA 91125                 |
> > | WWW: http://www.astro.caltech.edu/~pls/                            |
> > 
> > *--------------------------------------------------------------------*
> > 
> > On 8/23/13 1:08 PM, Dragseth Roy Einar wrote:
> > > Thanks for the suggestion!  It didn't help, but as I read the
> > > documentation on vfs_cache_pressure in the kernel docs I noticed 
the
> > > next
> > > parameter, zone_reclaim_mode, which looked like it might be worth
> > > fiddling with.  And what do you know, changing it from 0 to 1 made 
the
> > > system overhead vanish immediately!
> > > 
> > > I must admit I do not completely understand why this helps, but it 
seems
> > > to do the trick in my case.  We'll put
> > > 
> > > vm.zone_reclaim_mode = 1
> > > 
> > > into /etc/sysctl.conf from now on.
> > > 
> > > Thanks to all for the hints and comments on this.
> > > 
> > > A nice weekend to everyone, mine for sure is going to be...
> > > r.
> > > 
> > > On Friday 23. August 2013 09.36.34 Scott Nolin wrote:
> > >> You might also try increasing the vfs_cache_pressure.
> > >> 
> > >> This will reclaim inode and dentry caches faster. Maybe that's the
> > >> problem, not page caches.
> > >> 
> > >> To be clear - I have no deep insight into Lustre's use of the client
> > >> cache, but you said you has lots of small files, which if lustre uses
> > >> the cache system like other filesystems means it may be
> > >> inodes/dentries.
> > >> Filling up the page cache with files like you did in your other tests
> > >> wouldn't have the same effect. Just my guess here.
> > >> 
> > >> We had some experience years ago with the opposite sort of 
problem. We
> > >> have a big ftp server, and we want to *keep* inode/dentry data in 
the
> > >> linux cache, as there are often stupid numbers of files in 
directories.
> > >> Files were always flowing through the server, so the page cache 
would
> > >> force out the inode cache. Was surprised to find with linux there's 
no
> > >> ability to set a fixed inode cache size - the best you can do is
> > >> "suggest" with the cache pressure tunable.
> > >> 
> > >> Scott
> > >> 
> > >> On 8/23/2013 6:29 AM, Dragseth Roy Einar wrote:
> > >>> I tried to change swapiness from 0 to 95 but it did not have any
> > >>> impact
> > >>> on
> > >>> the system overhead.
> > >>> 
> > >>> r.
> > >>> 
> > >>> On Thursday 22. August 2013 15.38.37 Dragseth Roy Einar 
wrote:
> > >>>> No, I cannot detect any swap activity on the system.
> > >>>> 
> > >>>> r.
> > >>>> 
> > >>>> On Thursday 22. August 2013 09.21.33 you wrote:
> > >>>>> Is this slowdown due to increased swap activity?  If "yes", then 
try
> > >>>>> lowering the "swappiness" value.  This will sacrifice buffer 
cache
> > >>>>> space
> > >>>>> to
> > >>>>> lower swap activity.
> > >>>>> 
> > >>>>> Take a look at http://en.wikipedia.org/wiki/Swappiness.
> > >>>>> 
> > >>>>> Roger S.
> > >>>>> 
> > >>>>> On 08/22/2013 08:51 AM, Roy Dragseth wrote:
> > >>>>>> We have just discovered that a large buffer cache 
generated from
> > >>>>>> traversing a lustre file system will cause a significant system
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20130903/6e8e6430/attachment.htm>