[Lustre-discuss] Thread might be hung, Heavy IO Load messages

Thu Feb 2 16:05:08 PST 2012

I found this thread "Luster clients getting evicted" as I've also seen
the "ost_connect operation failed with -16" message and there they
recommend increasing the timeout, though that was for 1.6 and as I've
read 1.8 has a different timeout system. Reading that, would
increasing at_min(currently 0) or at_max(currently 600) be best?

On Thu, Feb 2, 2012 at 12:07 PM, Andreas Dilger <adilger at whamcloud.com> wrote:
> On 2012-02-02, at 8:54 AM, David Noriega wrote:
>> We have two OSSs, each with two quad core AMD Opterons and 8GB of ram
>> and two OSTs each(4.4T and 3.5T). Backend storage is a pair of Sun
>> StorageTek 2540 connected with 8Gb fiber.
>
> Running 32-64 threads per OST is the optimum number, based on previous
> experience.
>
>> What about tweaking max_dirty_mb on the client side?
>
> Probably unrelated.
>
>> On Wed, Feb 1, 2012 at 6:33 PM, Carlos Thomaz <cthomaz at ddn.com> wrote:
>>> David,
>>>
>>> The oss service threads is a function of your RAM size and CPUs. It's
>>> difficult to say what would be a good upper limit without knowing the size
>>> of your OSS, # clients, storage back-end and workload. But the good thing
>>> you can give a try on the fly via lctl set_param command.
>>>
>>> Assuming you are running lustre 1.8, here is a good explanation on how to
>>> do it:
>>> http://wiki.lustre.org/manual/LustreManual18_HTML/LustreProc.html#50651263_
>>> 87260
>>>
>>> Some remarks:
>>> - reducing the number of OSS threads may impact the performance depending
>>> on how is your workload.
>>> - unfortunately I guess you will need to try and see what happens. I would
>>> go for 128 and analyze the behavior of your OSSs (via log files) and also
>>> keeping an eye on your workload. Seems to me that 300 is a bit too high
>>> (but again, I don't know what you have on your storage back-end or OSS
>>> configuration).
>>>
>>>
>>> I can't tell you much about the lru_size, but as far as I understand the
>>> values are dynamic and there's not much to do rather than clear the last
>>> recently used queue or disable the lru sizing. I can't help much on this
>>> other than pointing you out the explanation for it (see 31.2.11):
>>>
>>> http://wiki.lustre.org/manual/LustreManual20_HTML/LustreProc.html
>>>
>>>
>>> Regards,
>>> Carlos
>>>
>>>
>>>
>>>
>>> --
>>> Carlos Thomaz | HPC Systems Architect
>>> Mobile: +1 (303) 519-0578
>>> cthomaz at ddn.com | Skype ID: carlosthomaz
>>> DataDirect Networks, Inc.
>>> 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921
>>> ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless
>>> <http://twitter.com/ddn_limitless> | 1.800.TERABYTE
>>>
>>>
>>>
>>>
>>>
>>> On 2/1/12 2:11 PM, "David Noriega" <tsk133 at my.utsa.edu> wrote:
>>>
>>>> zone_reclaim_mode is 0 on all clients/servers
>>>>
>>>> When changing number of service threads or the lru_size, can these be
>>>> done on the fly or do they require a reboot of either client or
>>>> server?
>>>> For my two OSTs, cat /proc/fs/lustre/ost/OSS/ost_io/threads_started
>>>> give about 300(300, 359) so I'm thinking try half of that and see how
>>>> it goes?
>>>>
>>>> Also checking lru_size, I get different numbers from the clients. cat
>>>> /proc/fs/lustre/ldlm/namespaces/*/lru_size
>>>>
>>>> Client: MDT0 OST0 OST1 OST2 OST3 MGC
>>>> head node: 0 22 22 22 22 400 (only a few users logged in)
>>>> busy node: 1 501 504 503 505 400 (Fully loaded with jobs)
>>>> samba/nfs server: 4 440070 44370 44348 26282 1600
>>>>
>>>> So my understanding is the lru_size is set to auto by default thus the
>>>> varying values, but setting it manually is effectively setting a max
>>>> value? Also what does it mean to have a lower value(especially in the
>>>> case of the samba/nfs server)?
>>>>
>>>> On Wed, Feb 1, 2012 at 1:27 PM, Charles Taylor <taylor at hpc.ufl.edu> wrote:
>>>>>
>>>>> You may also want to check and, if necessary, limit the lru_size on
>>>>> your clients.   I believe there are guidelines in the ops manual.
>>>>> We have ~750 clients and limit ours to 600 per OST.   That, combined
>>>>> with the setting zone_reclaim_mode=0 should make a big difference.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Charlie Taylor
>>>>> UF HPC Center
>>>>>
>>>>>
>>>>> On Feb 1, 2012, at 2:04 PM, Carlos Thomaz wrote:
>>>>>
>>>>>> Hi David,
>>>>>>
>>>>>> You may be facing the same issue discussed on previous threads, which
>>>>>> is
>>>>>> the issue regarding the zone_reclaim_mode.
>>>>>>
>>>>>> Take a look on the previous thread where myself and Kevin replied to
>>>>>> Vijesh Ek.
>>>>>>
>>>>>> If you don't have access to the previous emails, look at your kernel
>>>>>> settings for the zone reclaim:
>>>>>>
>>>>>> cat /proc/sys/vm/zone_reclaim_mode
>>>>>>
>>>>>> It should be set to 0.
>>>>>>
>>>>>> Also, look at the number of Lustre OSS service threads. It may be set
>>>>>> to
>>>>>> high...
>>>>>>
>>>>>> Rgds.
>>>>>> Carlos.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Carlos Thomaz | HPC Systems Architect
>>>>>> Mobile: +1 (303) 519-0578
>>>>>> cthomaz at ddn.com | Skype ID: carlosthomaz
>>>>>> DataDirect Networks, Inc.
>>>>>> 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921
>>>>>> ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless
>>>>>> <http://twitter.com/ddn_limitless> | 1.800.TERABYTE
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2/1/12 11:57 AM, "David Noriega" <tsk133 at my.utsa.edu> wrote:
>>>>>>
>>>>>>> indicates the system was overloaded (too many service threads, or
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Lustre-discuss mailing list
>>>>>> Lustre-discuss at lists.lustre.org
>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>>
>>>>> Charles A. Taylor, Ph.D.
>>>>> Associate Director,
>>>>> UF HPC Center
>>>>> (352) 392-4036
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> David Noriega
>>>> System Administrator
>>>> Computational Biology Initiative
>>>> High Performance Computing Center
>>>> University of Texas at San Antonio
>>>> One UTSA Circle
>>>> San Antonio, TX 78249
>>>> Office: BSE 3.112
>>>> Phone: 210-458-7100
>>>> http://www.cbi.utsa.edu
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>
>>
>>
>>
>> --
>> David Noriega
>> System Administrator
>> Computational Biology Initiative
>> High Performance Computing Center
>> University of Texas at San Antonio
>> One UTSA Circle
>> San Antonio, TX 78249
>> Office: BSE 3.112
>> Phone: 210-458-7100
>> http://www.cbi.utsa.edu
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
> Cheers, Andreas
> --
> Andreas Dilger                       Whamcloud, Inc.
> Principal Engineer                   http://www.whamcloud.com/
>
>
>
>

-- 
David Noriega
System Administrator
Computational Biology Initiative
High Performance Computing Center
University of Texas at San Antonio
One UTSA Circle
San Antonio, TX 78249
Office: BSE 3.112
Phone: 210-458-7100
http://www.cbi.utsa.edu