[Lustre-discuss] Lustre buffer cache causes large system overhead.

Roger Sersted rs1 at aps.anl.gov
Thu Aug 22 07:21:33 PDT 2013




Is this slowdown due to increased swap activity?  If "yes", then try lowering 
the "swappiness" value.  This will sacrifice buffer cache space to lower swap 
activity.

Take a look at http://en.wikipedia.org/wiki/Swappiness.

Roger S.


On 08/22/2013 08:51 AM, Roy Dragseth wrote:
> We have just discovered that a large buffer cache generated from traversing a
> lustre file system will cause a significant system overhead for applications
> with high memory demands.  We have seen a 50% slowdown or worse for
> applications.  Even High Performance Linpack, that have no file IO whatsoever
> is affected.  The only remedy seems to be to empty the buffer cache from memory
> by running "echo 3 > /proc/sys/vm/drop_caches"
>
> Any hints on how to improve the situation is greatly appreciated.
>
>
> System setup:
> Client: Dual socket Sandy Bridge, with 32GB ram and infiniband connection to
> lustre server.  CentOS 6.4, with kernel 2.6.32-358.11.1.el6.x86_64 and lustre
> v2.1.6 rpms downloaded from whamcloud download site.
>
> Lustre: 1 MDS and 4 OSS running Lustre 2.1.3 (also from whamcloud site).  Each
> OSS has 12 OST, total 1.1 PB storage.
>
> How to reproduce:
>
> Traverse the lustre file system until the buffer cache is large enough.  In our
> case we run
>
>   find . -print0 -type f | xargs -0 cat > /dev/null
>
> on the client until the buffer cache reaches ~15-20GB.  (The lustre file system
> has lots of small files so this takes up to an hour.)
>
> Kill the find process and start a single node parallel application, we use HPL
> (high performance linpack).  We run on all 16 cores on the system with 1GB ram
> per core (a normal run should complete in appr. 150 seconds.)  The system
> monitoring shows a 10-20% system cpu overhead and the HPL run takes more than
> 200 secs.  After running "echo 3 > /proc/sys/vm/drop_caches" the system
> performance goes back to normal with a run time at 150 secs.
>
> I've created an infographic from our ganglia graphs for the above scenario.
>
> https://dl.dropboxusercontent.com/u/23468442/misc/lustre_bc_overhead.png
>
> Attached is an excerpt from perf top indicating that the kernel routine taking
> the most time is _spin_lock_irqsave if that means anything to anyone.
>
>
> Things tested:
>
> It does not seem to matter if we mount lustre over infiniband or ethernet.
>
> Filling the buffer cache with files from an NFS filesystem does not degrade
> performance.
>
> Filling the buffer cache with one large file does not give degraded performance.
> (tested with iozone)
>
>
> Again, any hints on how to proceed is greatly appreciated.
>
>
> Best regards,
> Roy.
>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>



More information about the lustre-discuss mailing list