[Lustre-discuss] Thread might be hung, Heavy IO Load messages

Thu Feb 2 17:37:22 PST 2012

I can't comment much on this (don't have much experience tuning it), but
Lustre 1.8 has a completely different timeouts architecture (Adaptive
timeouts).
I suggest you to take a deep look first:

--
Carlos Thomaz | HPC Systems Architect
Mobile: +1 (303) 519-0578
cthomaz at ddn.com | Skype ID: carlosthomaz
DataDirect Networks, Inc.
9960 Federal Dr., Ste 100 Colorado Springs, CO 80921
ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless
<http://twitter.com/ddn_limitless> | 1.800.TERABYTE

On 2/2/12 5:05 PM, "David Noriega" <tsk133 at my.utsa.edu> wrote:

>I found this thread "Luster clients getting evicted" as I've also seen
>the "ost_connect operation failed with -16" message and there they
>recommend increasing the timeout, though that was for 1.6 and as I've
>read 1.8 has a different timeout system. Reading that, would
>increasing at_min(currently 0) or at_max(currently 600) be best?
>
>On Thu, Feb 2, 2012 at 12:07 PM, Andreas Dilger <adilger at whamcloud.com>
>wrote:
>> On 2012-02-02, at 8:54 AM, David Noriega wrote:
>>> We have two OSSs, each with two quad core AMD Opterons and 8GB of ram
>>> and two OSTs each(4.4T and 3.5T). Backend storage is a pair of Sun
>>> StorageTek 2540 connected with 8Gb fiber.
>>
>> Running 32-64 threads per OST is the optimum number, based on previous
>> experience.
>>
>>> What about tweaking max_dirty_mb on the client side?
>>
>> Probably unrelated.
>>
>>> On Wed, Feb 1, 2012 at 6:33 PM, Carlos Thomaz <cthomaz at ddn.com> wrote:
>>>> David,
>>>>
>>>> The oss service threads is a function of your RAM size and CPUs. It's
>>>> difficult to say what would be a good upper limit without knowing the
>>>>size
>>>> of your OSS, # clients, storage back-end and workload. But the good
>>>>thing
>>>> you can give a try on the fly via lctl set_param command.
>>>>
>>>> Assuming you are running lustre 1.8, here is a good explanation on
>>>>how to
>>>> do it:
>>>> 
>>>>http://wiki.lustre.org/manual/LustreManual18_HTML/LustreProc.html#50651
>>>>263_
>>>> 87260
>>>>
>>>> Some remarks:
>>>> - reducing the number of OSS threads may impact the performance
>>>>depending
>>>> on how is your workload.
>>>> - unfortunately I guess you will need to try and see what happens. I
>>>>would
>>>> go for 128 and analyze the behavior of your OSSs (via log files) and
>>>>also
>>>> keeping an eye on your workload. Seems to me that 300 is a bit too
>>>>high
>>>> (but again, I don't know what you have on your storage back-end or OSS
>>>> configuration).
>>>>
>>>>
>>>> I can't tell you much about the lru_size, but as far as I understand
>>>>the
>>>> values are dynamic and there's not much to do rather than clear the
>>>>last
>>>> recently used queue or disable the lru sizing. I can't help much on
>>>>this
>>>> other than pointing you out the explanation for it (see 31.2.11):
>>>>
>>>> http://wiki.lustre.org/manual/LustreManual20_HTML/LustreProc.html
>>>>
>>>>
>>>> Regards,
>>>> Carlos
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Carlos Thomaz | HPC Systems Architect
>>>> Mobile: +1 (303) 519-0578
>>>> cthomaz at ddn.com | Skype ID: carlosthomaz
>>>> DataDirect Networks, Inc.
>>>> 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921
>>>> ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless
>>>> <http://twitter.com/ddn_limitless> | 1.800.TERABYTE
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 2/1/12 2:11 PM, "David Noriega" <tsk133 at my.utsa.edu> wrote:
>>>>
>>>>> zone_reclaim_mode is 0 on all clients/servers
>>>>>
>>>>> When changing number of service threads or the lru_size, can these be
>>>>> done on the fly or do they require a reboot of either client or
>>>>> server?
>>>>> For my two OSTs, cat /proc/fs/lustre/ost/OSS/ost_io/threads_started
>>>>> give about 300(300, 359) so I'm thinking try half of that and see how
>>>>> it goes?
>>>>>
>>>>> Also checking lru_size, I get different numbers from the clients. cat
>>>>> /proc/fs/lustre/ldlm/namespaces/*/lru_size
>>>>>
>>>>> Client: MDT0 OST0 OST1 OST2 OST3 MGC
>>>>> head node: 0 22 22 22 22 400 (only a few users logged in)
>>>>> busy node: 1 501 504 503 505 400 (Fully loaded with jobs)
>>>>> samba/nfs server: 4 440070 44370 44348 26282 1600
>>>>>
>>>>> So my understanding is the lru_size is set to auto by default thus
>>>>>the
>>>>> varying values, but setting it manually is effectively setting a max
>>>>> value? Also what does it mean to have a lower value(especially in the
>>>>> case of the samba/nfs server)?
>>>>>
>>>>> On Wed, Feb 1, 2012 at 1:27 PM, Charles Taylor <taylor at hpc.ufl.edu>
>>>>>wrote:
>>>>>>
>>>>>> You may also want to check and, if necessary, limit the lru_size on
>>>>>> your clients.   I believe there are guidelines in the ops manual.
>>>>>> We have ~750 clients and limit ours to 600 per OST.   That, combined
>>>>>> with the setting zone_reclaim_mode=0 should make a big difference.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Charlie Taylor
>>>>>> UF HPC Center
>>>>>>
>>>>>>
>>>>>> On Feb 1, 2012, at 2:04 PM, Carlos Thomaz wrote:
>>>>>>
>>>>>>> Hi David,
>>>>>>>
>>>>>>> You may be facing the same issue discussed on previous threads,
>>>>>>>which
>>>>>>> is
>>>>>>> the issue regarding the zone_reclaim_mode.
>>>>>>>
>>>>>>> Take a look on the previous thread where myself and Kevin replied
>>>>>>>to
>>>>>>> Vijesh Ek.
>>>>>>>
>>>>>>> If you don't have access to the previous emails, look at your
>>>>>>>kernel
>>>>>>> settings for the zone reclaim:
>>>>>>>
>>>>>>> cat /proc/sys/vm/zone_reclaim_mode
>>>>>>>
>>>>>>> It should be set to 0.
>>>>>>>
>>>>>>> Also, look at the number of Lustre OSS service threads. It may be
>>>>>>>set
>>>>>>> to
>>>>>>> high...
>>>>>>>
>>>>>>> Rgds.
>>>>>>> Carlos.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Carlos Thomaz | HPC Systems Architect
>>>>>>> Mobile: +1 (303) 519-0578
>>>>>>> cthomaz at ddn.com | Skype ID: carlosthomaz
>>>>>>> DataDirect Networks, Inc.
>>>>>>> 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921
>>>>>>> ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless
>>>>>>> <http://twitter.com/ddn_limitless> | 1.800.TERABYTE
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2/1/12 11:57 AM, "David Noriega" <tsk133 at my.utsa.edu> wrote:
>>>>>>>
>>>>>>>> indicates the system was overloaded (too many service threads, or
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Lustre-discuss mailing list
>>>>>>> Lustre-discuss at lists.lustre.org
>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>>>
>>>>>> Charles A. Taylor, Ph.D.
>>>>>> Associate Director,
>>>>>> UF HPC Center
>>>>>> (352) 392-4036
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> David Noriega
>>>>> System Administrator
>>>>> Computational Biology Initiative
>>>>> High Performance Computing Center
>>>>> University of Texas at San Antonio
>>>>> One UTSA Circle
>>>>> San Antonio, TX 78249
>>>>> Office: BSE 3.112
>>>>> Phone: 210-458-7100
>>>>> http://www.cbi.utsa.edu
>>>>> _______________________________________________
>>>>> Lustre-discuss mailing list
>>>>> Lustre-discuss at lists.lustre.org
>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>
>>>
>>>
>>>
>>> --
>>> David Noriega
>>> System Administrator
>>> Computational Biology Initiative
>>> High Performance Computing Center
>>> University of Texas at San Antonio
>>> One UTSA Circle
>>> San Antonio, TX 78249
>>> Office: BSE 3.112
>>> Phone: 210-458-7100
>>> http://www.cbi.utsa.edu
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>> Cheers, Andreas
>> --
>> Andreas Dilger                       Whamcloud, Inc.
>> Principal Engineer                   http://www.whamcloud.com/
>>
>>
>>
>>
>
>
>
>-- 
>David Noriega
>System Administrator
>Computational Biology Initiative
>High Performance Computing Center
>University of Texas at San Antonio
>One UTSA Circle
>San Antonio, TX 78249
>Office: BSE 3.112
>Phone: 210-458-7100
>http://www.cbi.utsa.edu
>_______________________________________________
>Lustre-discuss mailing list
>Lustre-discuss at lists.lustre.org
>http://lists.lustre.org/mailman/listinfo/lustre-discuss