[Lustre-discuss] fast traverse

Mon Sep 15 17:18:59 PDT 2008

While doing a large scan of  a "large" lustre filesystem (10TB) I
noticed the client hung the host.

I did a simple 'find /oagre/lustre/fs ' and it naturally took 36 hours
since there are many small files. But we noticed the host crashed with
ll_socket<pid of find> and 'find' process taking up 100% CPU. No
commands were working, but I was able to ssh into the box. We are
using Lustre 1.6.5.1. Is this a known issue? Could this be a statahead
issue mentioned in the previous threads?

Sorry if this is redundant.

TIA

On Thu, Sep 11, 2008 at 7:50 PM, Mag Gam <magawake at gmail.com> wrote:
> I have 32GB on the MDS. So, where do I start? :-)
>
>
>
> On Thu, Sep 11, 2008 at 6:05 PM, Andreas Dilger <adilger at sun.com> wrote:
>> On Sep 11, 2008  06:28 -0400, Mag Gam wrote:
>>> I have a filesystem with over 1m directories which are filled with
>>> hourly temperatures of a controller environment for years. They are
>>> being hosted on our lustre filesystem, and I constantly do a a fstat()
>>> and fstat64() to get the directory's create time. I was wondering if
>>> there is a way to speed this operation? Is it possible for me to
>>> increase the mds cache? Are there any tricks I can perform to speed
>>> this operation up?
>>
>> To cache 1M directory entries would need in the range of 6GB of RAM.
>>
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Sr. Staff Engineer, Lustre Group
>> Sun Microsystems of Canada, Inc.
>>
>>
>