[Lustre-discuss] Memory (?) problem with 1.8.1

Tue Oct 13 07:54:52 PDT 2009

This sounds very much like a problem we saw before we changed the lru_size to a fixed size from dynamic.

--
Andrew

-----Original Message-----
From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of David Simas
Sent: Monday, October 12, 2009 6:07 PM
To: lustre-discuss at lists.lustre.org
Subject: [Lustre-discuss] Memory (?) problem with 1.8.1

Hello,

We have a Lustre 1.8.1 file system about 60 TB in size running on
RHEL 5 x86_64.  (I can provide hardware details if anyone thinks
they'd be relevant.)  We are seeing memory problems after several
days of sustained I/O into that file system.  We are writing from
a small number of clients (4 - 5) at an average rate of 50 MB/s, with
peaks of 350 MB/s.  We read all the data at least twice before deleting
them.  During this operation, we notice the value of "buffers"
reported in '/proc/meminfo' on the OSSs involved increasing monotonically
until it apparently take up all the system's memory - 32 GB.  Then 'kswapd'
starts consuming a large amount of CPU, the load increases (100+), and the
system, including Lustre, slows to crawl and becomes quite useless.  If we
stop Lustre I/O at this point, 'kswapd' and the system load calm down, but
the "buffers" value does not decrease.  Any I/O on the system will then
(dd if=/dev/urandom of=/tmp/test ...) will cause 'kswapd' to run away
again.  We have observed the monotonically increasing "buffers" condition
with non-Lustre I/O on systems running the Lustre 1.8.1 kernel
(2.6.18-128.1.14.el5_lustre.1.8.1), but we haven't gotten them to point
where 'kswapd' goes wild.

Has anyboy else seen anything like this?

David Simas
SLAC

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss