[Lustre-discuss] Thread might be hung, Heavy IO Load messages

David Noriega tsk133 at my.utsa.edu
Thu Feb 2 07:54:20 PST 2012


We have two OSSs, each with two quad core AMD Opterons and 8GB of ram
and two OSTs each(4.4T and 3.5T). Backend storage is a pair of Sun
StorageTek 2540 connected with 8Gb fiber.

What about tweaking max_dirty_mb on the client side?

On Wed, Feb 1, 2012 at 6:33 PM, Carlos Thomaz <cthomaz at ddn.com> wrote:
> David,
>
> The oss service threads is a function of your RAM size and CPUs. It's
> difficult to say what would be a good upper limit without knowing the size
> of your OSS, # clients, storage back-end and workload. But the good thing
> you can give a try on the fly via lctl set_param command.
>
> Assuming you are running lustre 1.8, here is a good explanation on how to
> do it:
> http://wiki.lustre.org/manual/LustreManual18_HTML/LustreProc.html#50651263_
> 87260
>
> Some remarks:
> - reducing the number of OSS threads may impact the performance depending
> on how is your workload.
> - unfortunately I guess you will need to try and see what happens. I would
> go for 128 and analyze the behavior of your OSSs (via log files) and also
> keeping an eye on your workload. Seems to me that 300 is a bit too high
> (but again, I don't know what you have on your storage back-end or OSS
> configuration).
>
>
> I can't tell you much about the lru_size, but as far as I understand the
> values are dynamic and there's not much to do rather than clear the last
> recently used queue or disable the lru sizing. I can't help much on this
> other than pointing you out the explanation for it (see 31.2.11):
>
> http://wiki.lustre.org/manual/LustreManual20_HTML/LustreProc.html
>
>
> Regards,
> Carlos
>
>
>
>
> --
> Carlos Thomaz | HPC Systems Architect
> Mobile: +1 (303) 519-0578
> cthomaz at ddn.com | Skype ID: carlosthomaz
> DataDirect Networks, Inc.
> 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921
> ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless
> <http://twitter.com/ddn_limitless> | 1.800.TERABYTE
>
>
>
>
>
> On 2/1/12 2:11 PM, "David Noriega" <tsk133 at my.utsa.edu> wrote:
>
>>zone_reclaim_mode is 0 on all clients/servers
>>
>>When changing number of service threads or the lru_size, can these be
>>done on the fly or do they require a reboot of either client or
>>server?
>>For my two OSTs, cat /proc/fs/lustre/ost/OSS/ost_io/threads_started
>>give about 300(300, 359) so I'm thinking try half of that and see how
>>it goes?
>>
>>Also checking lru_size, I get different numbers from the clients. cat
>>/proc/fs/lustre/ldlm/namespaces/*/lru_size
>>
>>Client: MDT0 OST0 OST1 OST2 OST3 MGC
>>head node: 0 22 22 22 22 400 (only a few users logged in)
>>busy node: 1 501 504 503 505 400 (Fully loaded with jobs)
>>samba/nfs server: 4 440070 44370 44348 26282 1600
>>
>>So my understanding is the lru_size is set to auto by default thus the
>>varying values, but setting it manually is effectively setting a max
>>value? Also what does it mean to have a lower value(especially in the
>>case of the samba/nfs server)?
>>
>>On Wed, Feb 1, 2012 at 1:27 PM, Charles Taylor <taylor at hpc.ufl.edu> wrote:
>>>
>>> You may also want to check and, if necessary, limit the lru_size on
>>>your clients.   I believe there are guidelines in the ops manual.
>>>We have ~750 clients and limit ours to 600 per OST.   That, combined
>>>with the setting zone_reclaim_mode=0 should make a big difference.
>>>
>>> Regards,
>>>
>>> Charlie Taylor
>>> UF HPC Center
>>>
>>>
>>> On Feb 1, 2012, at 2:04 PM, Carlos Thomaz wrote:
>>>
>>>> Hi David,
>>>>
>>>> You may be facing the same issue discussed on previous threads, which
>>>>is
>>>> the issue regarding the zone_reclaim_mode.
>>>>
>>>> Take a look on the previous thread where myself and Kevin replied to
>>>> Vijesh Ek.
>>>>
>>>> If you don't have access to the previous emails, look at your kernel
>>>> settings for the zone reclaim:
>>>>
>>>> cat /proc/sys/vm/zone_reclaim_mode
>>>>
>>>> It should be set to 0.
>>>>
>>>> Also, look at the number of Lustre OSS service threads. It may be set
>>>>to
>>>> high...
>>>>
>>>> Rgds.
>>>> Carlos.
>>>>
>>>>
>>>> --
>>>> Carlos Thomaz | HPC Systems Architect
>>>> Mobile: +1 (303) 519-0578
>>>> cthomaz at ddn.com | Skype ID: carlosthomaz
>>>> DataDirect Networks, Inc.
>>>> 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921
>>>> ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless
>>>> <http://twitter.com/ddn_limitless> | 1.800.TERABYTE
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 2/1/12 11:57 AM, "David Noriega" <tsk133 at my.utsa.edu> wrote:
>>>>
>>>>> indicates the system was overloaded (too many service threads, or
>>>>>
>>>>
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>
>>> Charles A. Taylor, Ph.D.
>>> Associate Director,
>>> UF HPC Center
>>> (352) 392-4036
>>>
>>>
>>>
>>
>>
>>
>>--
>>David Noriega
>>System Administrator
>>Computational Biology Initiative
>>High Performance Computing Center
>>University of Texas at San Antonio
>>One UTSA Circle
>>San Antonio, TX 78249
>>Office: BSE 3.112
>>Phone: 210-458-7100
>>http://www.cbi.utsa.edu
>>_______________________________________________
>>Lustre-discuss mailing list
>>Lustre-discuss at lists.lustre.org
>>http://lists.lustre.org/mailman/listinfo/lustre-discuss
>



-- 
David Noriega
System Administrator
Computational Biology Initiative
High Performance Computing Center
University of Texas at San Antonio
One UTSA Circle
San Antonio, TX 78249
Office: BSE 3.112
Phone: 210-458-7100
http://www.cbi.utsa.edu



More information about the lustre-discuss mailing list