[Lustre-discuss] Memory (?) problem with 1.8.1
David Simas
dgs at slac.stanford.edu
Mon Oct 12 17:06:43 PDT 2009
Hello,
We have a Lustre 1.8.1 file system about 60 TB in size running on
RHEL 5 x86_64. (I can provide hardware details if anyone thinks
they'd be relevant.) We are seeing memory problems after several
days of sustained I/O into that file system. We are writing from
a small number of clients (4 - 5) at an average rate of 50 MB/s, with
peaks of 350 MB/s. We read all the data at least twice before deleting
them. During this operation, we notice the value of "buffers"
reported in '/proc/meminfo' on the OSSs involved increasing monotonically
until it apparently take up all the system's memory - 32 GB. Then 'kswapd'
starts consuming a large amount of CPU, the load increases (100+), and the
system, including Lustre, slows to crawl and becomes quite useless. If we
stop Lustre I/O at this point, 'kswapd' and the system load calm down, but
the "buffers" value does not decrease. Any I/O on the system will then
(dd if=/dev/urandom of=/tmp/test ...) will cause 'kswapd' to run away
again. We have observed the monotonically increasing "buffers" condition
with non-Lustre I/O on systems running the Lustre 1.8.1 kernel
(2.6.18-128.1.14.el5_lustre.1.8.1), but we haven't gotten them to point
where 'kswapd' goes wild.
Has anyboy else seen anything like this?
David Simas
SLAC
More information about the lustre-discuss
mailing list