[Lustre-discuss] Lustre buffer cache causes large system overhead.

Dragseth Roy Einar roy.dragseth at uit.no
Thu Aug 22 08:38:37 PDT 2013


No, I cannot detect any swap activity on the system.

r.


On Thursday 22. August 2013 09.21.33 you wrote:
> Is this slowdown due to increased swap activity?  If "yes", then try
> lowering the "swappiness" value.  This will sacrifice buffer cache space to
> lower swap activity.
> 
> Take a look at http://en.wikipedia.org/wiki/Swappiness.
> 
> Roger S.
> 
> On 08/22/2013 08:51 AM, Roy Dragseth wrote:
> > We have just discovered that a large buffer cache generated from
> > traversing a lustre file system will cause a significant system overhead
> > for applications with high memory demands.  We have seen a 50% slowdown
> > or worse for applications.  Even High Performance Linpack, that have no
> > file IO whatsoever is affected.  The only remedy seems to be to empty the
> > buffer cache from memory by running "echo 3 > /proc/sys/vm/drop_caches"
> > 
> > Any hints on how to improve the situation is greatly appreciated.
> > 
> > 
> > System setup:
> > Client: Dual socket Sandy Bridge, with 32GB ram and infiniband connection
> > to lustre server.  CentOS 6.4, with kernel 2.6.32-358.11.1.el6.x86_64 and
> > lustre v2.1.6 rpms downloaded from whamcloud download site.
> > 
> > Lustre: 1 MDS and 4 OSS running Lustre 2.1.3 (also from whamcloud site). 
> > Each OSS has 12 OST, total 1.1 PB storage.
> > 
> > How to reproduce:
> > 
> > Traverse the lustre file system until the buffer cache is large enough. 
> > In our case we run
> > 
> >   find . -print0 -type f | xargs -0 cat > /dev/null
> > 
> > on the client until the buffer cache reaches ~15-20GB.  (The lustre file
> > system has lots of small files so this takes up to an hour.)
> > 
> > Kill the find process and start a single node parallel application, we use
> > HPL (high performance linpack).  We run on all 16 cores on the system
> > with 1GB ram per core (a normal run should complete in appr. 150
> > seconds.)  The system monitoring shows a 10-20% system cpu overhead and
> > the HPL run takes more than 200 secs.  After running "echo 3 >
> > /proc/sys/vm/drop_caches" the system performance goes back to normal with
> > a run time at 150 secs.
> > 
> > I've created an infographic from our ganglia graphs for the above
> > scenario.
> > 
> > https://dl.dropboxusercontent.com/u/23468442/misc/lustre_bc_overhead.png
> > 
> > Attached is an excerpt from perf top indicating that the kernel routine
> > taking the most time is _spin_lock_irqsave if that means anything to
> > anyone.
> > 
> > 
> > Things tested:
> > 
> > It does not seem to matter if we mount lustre over infiniband or ethernet.
> > 
> > Filling the buffer cache with files from an NFS filesystem does not
> > degrade
> > performance.
> > 
> > Filling the buffer cache with one large file does not give degraded
> > performance. (tested with iozone)
> > 
> > 
> > Again, any hints on how to proceed is greatly appreciated.
> > 
> > 
> > Best regards,
> > Roy.
> > 
> > 
> > 
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
-- 

  The Computer Center, University of Tromsø, N-9037 TROMSØ Norway.
	      phone:+47 77 64 41 07, fax:+47 77 64 41 00
        Roy Dragseth, Team Leader, High Performance Computing
	 Direct call: +47 77 64 62 56. email: roy.dragseth at uit.no



More information about the lustre-discuss mailing list