[Lustre-discuss] Lustre buffer cache causes large system overhead.

Fri Aug 23 04:29:23 PDT 2013

I tried to change swapiness from 0 to 95 but it did not have any impact on the 
system overhead.

r.

On Thursday 22. August 2013 15.38.37 Dragseth Roy Einar wrote:
> No, I cannot detect any swap activity on the system.
> 
> r.
> 
> On Thursday 22. August 2013 09.21.33 you wrote:
> > Is this slowdown due to increased swap activity?  If "yes", then try
> > lowering the "swappiness" value.  This will sacrifice buffer cache space
> > to
> > lower swap activity.
> > 
> > Take a look at http://en.wikipedia.org/wiki/Swappiness.
> > 
> > Roger S.
> > 
> > On 08/22/2013 08:51 AM, Roy Dragseth wrote:
> > > We have just discovered that a large buffer cache generated from
> > > traversing a lustre file system will cause a significant system overhead
> > > for applications with high memory demands.  We have seen a 50% slowdown
> > > or worse for applications.  Even High Performance Linpack, that have no
> > > file IO whatsoever is affected.  The only remedy seems to be to empty
> > > the
> > > buffer cache from memory by running "echo 3 > /proc/sys/vm/drop_caches"
> > > 
> > > Any hints on how to improve the situation is greatly appreciated.
> > > 
> > > 
> > > System setup:
> > > Client: Dual socket Sandy Bridge, with 32GB ram and infiniband
> > > connection
> > > to lustre server.  CentOS 6.4, with kernel 2.6.32-358.11.1.el6.x86_64
> > > and
> > > lustre v2.1.6 rpms downloaded from whamcloud download site.
> > > 
> > > Lustre: 1 MDS and 4 OSS running Lustre 2.1.3 (also from whamcloud site).
> > > Each OSS has 12 OST, total 1.1 PB storage.
> > > 
> > > How to reproduce:
> > > 
> > > Traverse the lustre file system until the buffer cache is large enough.
> > > In our case we run
> > > 
> > >   find . -print0 -type f | xargs -0 cat > /dev/null
> > > 
> > > on the client until the buffer cache reaches ~15-20GB.  (The lustre file
> > > system has lots of small files so this takes up to an hour.)
> > > 
> > > Kill the find process and start a single node parallel application, we
> > > use
> > > HPL (high performance linpack).  We run on all 16 cores on the system
> > > with 1GB ram per core (a normal run should complete in appr. 150
> > > seconds.)  The system monitoring shows a 10-20% system cpu overhead and
> > > the HPL run takes more than 200 secs.  After running "echo 3 >
> > > /proc/sys/vm/drop_caches" the system performance goes back to normal
> > > with
> > > a run time at 150 secs.
> > > 
> > > I've created an infographic from our ganglia graphs for the above
> > > scenario.
> > > 
> > > https://dl.dropboxusercontent.com/u/23468442/misc/lustre_bc_overhead.png
> > > 
> > > Attached is an excerpt from perf top indicating that the kernel routine
> > > taking the most time is _spin_lock_irqsave if that means anything to
> > > anyone.
> > > 
> > > 
> > > Things tested:
> > > 
> > > It does not seem to matter if we mount lustre over infiniband or
> > > ethernet.
> > > 
> > > Filling the buffer cache with files from an NFS filesystem does not
> > > degrade
> > > performance.
> > > 
> > > Filling the buffer cache with one large file does not give degraded
> > > performance. (tested with iozone)
> > > 
> > > 
> > > Again, any hints on how to proceed is greatly appreciated.
> > > 
> > > 
> > > Best regards,
> > > Roy.
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > Lustre-discuss mailing list
> > > Lustre-discuss at lists.lustre.org
> > > http://lists.lustre.org/mailman/listinfo/lustre-discuss
-- 

  The Computer Center, University of Tromsø, N-9037 TROMSØ Norway.
	      phone:+47 77 64 41 07, fax:+47 77 64 41 00
        Roy Dragseth, Team Leader, High Performance Computing
	 Direct call: +47 77 64 62 56. email: roy.dragseth at uit.no