[Lustre-discuss] Thread might be hung, Heavy IO Load messages

Thu Feb 2 10:07:56 PST 2012

On 2012-02-02, at 8:54 AM, David Noriega wrote:
> We have two OSSs, each with two quad core AMD Opterons and 8GB of ram
> and two OSTs each(4.4T and 3.5T). Backend storage is a pair of Sun
> StorageTek 2540 connected with 8Gb fiber.

Running 32-64 threads per OST is the optimum number, based on previous
experience.

> What about tweaking max_dirty_mb on the client side?

Probably unrelated.

> On Wed, Feb 1, 2012 at 6:33 PM, Carlos Thomaz <cthomaz at ddn.com> wrote:
>> David,
>> 
>> The oss service threads is a function of your RAM size and CPUs. It's
>> difficult to say what would be a good upper limit without knowing the size
>> of your OSS, # clients, storage back-end and workload. But the good thing
>> you can give a try on the fly via lctl set_param command.
>> 
>> Assuming you are running lustre 1.8, here is a good explanation on how to
>> do it:
>> http://wiki.lustre.org/manual/LustreManual18_HTML/LustreProc.html#50651263_
>> 87260
>> 
>> Some remarks:
>> - reducing the number of OSS threads may impact the performance depending
>> on how is your workload.
>> - unfortunately I guess you will need to try and see what happens. I would
>> go for 128 and analyze the behavior of your OSSs (via log files) and also
>> keeping an eye on your workload. Seems to me that 300 is a bit too high
>> (but again, I don't know what you have on your storage back-end or OSS
>> configuration).
>> 
>> 
>> I can't tell you much about the lru_size, but as far as I understand the
>> values are dynamic and there's not much to do rather than clear the last
>> recently used queue or disable the lru sizing. I can't help much on this
>> other than pointing you out the explanation for it (see 31.2.11):
>> 
>> http://wiki.lustre.org/manual/LustreManual20_HTML/LustreProc.html
>> 
>> 
>> Regards,
>> Carlos
>> 
>> 
>> 
>> 
>> --
>> Carlos Thomaz | HPC Systems Architect
>> Mobile: +1 (303) 519-0578
>> cthomaz at ddn.com | Skype ID: carlosthomaz
>> DataDirect Networks, Inc.
>> 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921
>> ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless
>> <http://twitter.com/ddn_limitless> | 1.800.TERABYTE
>> 
>> 
>> 
>> 
>> 
>> On 2/1/12 2:11 PM, "David Noriega" <tsk133 at my.utsa.edu> wrote:
>> 
>>> zone_reclaim_mode is 0 on all clients/servers
>>> 
>>> When changing number of service threads or the lru_size, can these be
>>> done on the fly or do they require a reboot of either client or
>>> server?
>>> For my two OSTs, cat /proc/fs/lustre/ost/OSS/ost_io/threads_started
>>> give about 300(300, 359) so I'm thinking try half of that and see how
>>> it goes?
>>> 
>>> Also checking lru_size, I get different numbers from the clients. cat
>>> /proc/fs/lustre/ldlm/namespaces/*/lru_size
>>> 
>>> Client: MDT0 OST0 OST1 OST2 OST3 MGC
>>> head node: 0 22 22 22 22 400 (only a few users logged in)
>>> busy node: 1 501 504 503 505 400 (Fully loaded with jobs)
>>> samba/nfs server: 4 440070 44370 44348 26282 1600
>>> 
>>> So my understanding is the lru_size is set to auto by default thus the
>>> varying values, but setting it manually is effectively setting a max
>>> value? Also what does it mean to have a lower value(especially in the
>>> case of the samba/nfs server)?
>>> 
>>> On Wed, Feb 1, 2012 at 1:27 PM, Charles Taylor <taylor at hpc.ufl.edu> wrote:
>>>> 
>>>> You may also want to check and, if necessary, limit the lru_size on
>>>> your clients.   I believe there are guidelines in the ops manual.
>>>> We have ~750 clients and limit ours to 600 per OST.   That, combined
>>>> with the setting zone_reclaim_mode=0 should make a big difference.
>>>> 
>>>> Regards,
>>>> 
>>>> Charlie Taylor
>>>> UF HPC Center
>>>> 
>>>> 
>>>> On Feb 1, 2012, at 2:04 PM, Carlos Thomaz wrote:
>>>> 
>>>>> Hi David,
>>>>> 
>>>>> You may be facing the same issue discussed on previous threads, which
>>>>> is
>>>>> the issue regarding the zone_reclaim_mode.
>>>>> 
>>>>> Take a look on the previous thread where myself and Kevin replied to
>>>>> Vijesh Ek.
>>>>> 
>>>>> If you don't have access to the previous emails, look at your kernel
>>>>> settings for the zone reclaim:
>>>>> 
>>>>> cat /proc/sys/vm/zone_reclaim_mode
>>>>> 
>>>>> It should be set to 0.
>>>>> 
>>>>> Also, look at the number of Lustre OSS service threads. It may be set
>>>>> to
>>>>> high...
>>>>> 
>>>>> Rgds.
>>>>> Carlos.
>>>>> 
>>>>> 
>>>>> --
>>>>> Carlos Thomaz | HPC Systems Architect
>>>>> Mobile: +1 (303) 519-0578
>>>>> cthomaz at ddn.com | Skype ID: carlosthomaz
>>>>> DataDirect Networks, Inc.
>>>>> 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921
>>>>> ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless
>>>>> <http://twitter.com/ddn_limitless> | 1.800.TERABYTE
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On 2/1/12 11:57 AM, "David Noriega" <tsk133 at my.utsa.edu> wrote:
>>>>> 
>>>>>> indicates the system was overloaded (too many service threads, or
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Lustre-discuss mailing list
>>>>> Lustre-discuss at lists.lustre.org
>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>> 
>>>> Charles A. Taylor, Ph.D.
>>>> Associate Director,
>>>> UF HPC Center
>>>> (352) 392-4036
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> David Noriega
>>> System Administrator
>>> Computational Biology Initiative
>>> High Performance Computing Center
>>> University of Texas at San Antonio
>>> One UTSA Circle
>>> San Antonio, TX 78249
>>> Office: BSE 3.112
>>> Phone: 210-458-7100
>>> http://www.cbi.utsa.edu
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> 
> 
> 
> 
> -- 
> David Noriega
> System Administrator
> Computational Biology Initiative
> High Performance Computing Center
> University of Texas at San Antonio
> One UTSA Circle
> San Antonio, TX 78249
> Office: BSE 3.112
> Phone: 210-458-7100
> http://www.cbi.utsa.edu
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger                       Whamcloud, Inc.
Principal Engineer                   http://www.whamcloud.com/