[Lustre-discuss] Lustre buffer cache causes large system overhead.
Dragseth Roy Einar
roy.dragseth at uit.no
Thu Aug 22 08:38:37 PDT 2013
No, I cannot detect any swap activity on the system.
r.
On Thursday 22. August 2013 09.21.33 you wrote:
> Is this slowdown due to increased swap activity? If "yes", then try
> lowering the "swappiness" value. This will sacrifice buffer cache space to
> lower swap activity.
>
> Take a look at http://en.wikipedia.org/wiki/Swappiness.
>
> Roger S.
>
> On 08/22/2013 08:51 AM, Roy Dragseth wrote:
> > We have just discovered that a large buffer cache generated from
> > traversing a lustre file system will cause a significant system overhead
> > for applications with high memory demands. We have seen a 50% slowdown
> > or worse for applications. Even High Performance Linpack, that have no
> > file IO whatsoever is affected. The only remedy seems to be to empty the
> > buffer cache from memory by running "echo 3 > /proc/sys/vm/drop_caches"
> >
> > Any hints on how to improve the situation is greatly appreciated.
> >
> >
> > System setup:
> > Client: Dual socket Sandy Bridge, with 32GB ram and infiniband connection
> > to lustre server. CentOS 6.4, with kernel 2.6.32-358.11.1.el6.x86_64 and
> > lustre v2.1.6 rpms downloaded from whamcloud download site.
> >
> > Lustre: 1 MDS and 4 OSS running Lustre 2.1.3 (also from whamcloud site).
> > Each OSS has 12 OST, total 1.1 PB storage.
> >
> > How to reproduce:
> >
> > Traverse the lustre file system until the buffer cache is large enough.
> > In our case we run
> >
> > find . -print0 -type f | xargs -0 cat > /dev/null
> >
> > on the client until the buffer cache reaches ~15-20GB. (The lustre file
> > system has lots of small files so this takes up to an hour.)
> >
> > Kill the find process and start a single node parallel application, we use
> > HPL (high performance linpack). We run on all 16 cores on the system
> > with 1GB ram per core (a normal run should complete in appr. 150
> > seconds.) The system monitoring shows a 10-20% system cpu overhead and
> > the HPL run takes more than 200 secs. After running "echo 3 >
> > /proc/sys/vm/drop_caches" the system performance goes back to normal with
> > a run time at 150 secs.
> >
> > I've created an infographic from our ganglia graphs for the above
> > scenario.
> >
> > https://dl.dropboxusercontent.com/u/23468442/misc/lustre_bc_overhead.png
> >
> > Attached is an excerpt from perf top indicating that the kernel routine
> > taking the most time is _spin_lock_irqsave if that means anything to
> > anyone.
> >
> >
> > Things tested:
> >
> > It does not seem to matter if we mount lustre over infiniband or ethernet.
> >
> > Filling the buffer cache with files from an NFS filesystem does not
> > degrade
> > performance.
> >
> > Filling the buffer cache with one large file does not give degraded
> > performance. (tested with iozone)
> >
> >
> > Again, any hints on how to proceed is greatly appreciated.
> >
> >
> > Best regards,
> > Roy.
> >
> >
> >
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
--
The Computer Center, University of Tromsø, N-9037 TROMSØ Norway.
phone:+47 77 64 41 07, fax:+47 77 64 41 00
Roy Dragseth, Team Leader, High Performance Computing
Direct call: +47 77 64 62 56. email: roy.dragseth at uit.no
More information about the lustre-discuss
mailing list