[Lustre-discuss] 2.0-alpha2 MDS out of memory problem

Isaac Huang He.Huang at Sun.COM
Tue Jun 9 07:25:09 PDT 2009


On Tue, Jun 09, 2009 at 02:36:37PM +0200, Arne Wiebalck wrote:
> Dear all,
>
> I set up an 2.0-alpha2 system and planned to populate it with
> 100 million files. While populating it however, the MDS ran
> out of memory, the OOM kicked in, killed some processes, and
> all ended in a kernel panic.
>
> So I resetted the MDS and remounted the MDT. After around
> 30 seconds (no client access yet), the memory gets eaten up
> again, reproducing the very same scenario mentioned above.
>
> If I unmount the MDT 'in time', the memory gets freed up (so
> I am pretty sure it's Lustre and not something else).
>
> I had seen this with 2.0-alpha1 already, hence I upgraded
> to 2.0-alpha2. When using 2.0-alpha1, the system had around
> 10 million files and was not accessed at all when this
> behavior showed up.
>
> The system I am using for my tests has 1 MDS, 1 client and 3
> OSSs. The MDS has only 2 GB of memory, but this should only
> impact performance, not stability, right?
>
> Any comments welcome, I am also happy to provide more details.

Please show us:
/proc/meminfo /proc/slabinfo
/proc/sys/lnet/memused /proc/sys/lustre/memused*  /proc/sys/lustre/pagesused*

Preferably at around the OOM.

It'd also be helpful to get a debug dump of memory allocations:
1. echo malloc > /proc/sys/lnet/debug
2. at around the OOM, lctl dk > malloc.dk

How many clients were there? How were they connected to the MDS?

Isaac



More information about the lustre-discuss mailing list