[Lustre-discuss] Memory (?) problem with 1.8.1

Mon Oct 12 17:06:43 PDT 2009

Hello,

We have a Lustre 1.8.1 file system about 60 TB in size running on
RHEL 5 x86_64.  (I can provide hardware details if anyone thinks
they'd be relevant.)  We are seeing memory problems after several
days of sustained I/O into that file system.  We are writing from
a small number of clients (4 - 5) at an average rate of 50 MB/s, with
peaks of 350 MB/s.  We read all the data at least twice before deleting
them.  During this operation, we notice the value of "buffers"
reported in '/proc/meminfo' on the OSSs involved increasing monotonically
until it apparently take up all the system's memory - 32 GB.  Then 'kswapd'
starts consuming a large amount of CPU, the load increases (100+), and the
system, including Lustre, slows to crawl and becomes quite useless.  If we
stop Lustre I/O at this point, 'kswapd' and the system load calm down, but
the "buffers" value does not decrease.  Any I/O on the system will then
(dd if=/dev/urandom of=/tmp/test ...) will cause 'kswapd' to run away
again.  We have observed the monotonically increasing "buffers" condition
with non-Lustre I/O on systems running the Lustre 1.8.1 kernel
(2.6.18-128.1.14.el5_lustre.1.8.1), but we haven't gotten them to point
where 'kswapd' goes wild.

Has anyboy else seen anything like this?

David Simas
SLAC