[Lustre-discuss] Thread might be hung, Heavy IO Load messages

Thu Feb 2 17:38:56 PST 2012

Ooopss..
Take a look first at:

http://wiki.lustre.org/index.php/Architecture_-_Adaptive_Timeouts_-_Use_Cas
es

And google for adaptive timeouts

Carlos.

--
Carlos Thomaz | HPC Systems Architect
Mobile: +1 (303) 519-0578
cthomaz at ddn.com | Skype ID: carlosthomaz
DataDirect Networks, Inc.
9960 Federal Dr., Ste 100 Colorado Springs, CO 80921
ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless
<http://twitter.com/ddn_limitless> | 1.800.TERABYTE

On 2/2/12 6:37 PM, "Carlos Thomaz" <cthomaz at ddn.com> wrote:

>I can't comment much on this (don't have much experience tuning it), but
>Lustre 1.8 has a completely different timeouts architecture (Adaptive
>timeouts).
>I suggest you to take a deep look first:
>
>--
>Carlos Thomaz | HPC Systems Architect
>Mobile: +1 (303) 519-0578
>cthomaz at ddn.com | Skype ID: carlosthomaz
>DataDirect Networks, Inc.
>9960 Federal Dr., Ste 100 Colorado Springs, CO 80921
>ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless
><http://twitter.com/ddn_limitless> | 1.800.TERABYTE
>
>
>
>
>
>On 2/2/12 5:05 PM, "David Noriega" <tsk133 at my.utsa.edu> wrote:
>
>>I found this thread "Luster clients getting evicted" as I've also seen
>>the "ost_connect operation failed with -16" message and there they
>>recommend increasing the timeout, though that was for 1.6 and as I've
>>read 1.8 has a different timeout system. Reading that, would
>>increasing at_min(currently 0) or at_max(currently 600) be best?
>>
>>On Thu, Feb 2, 2012 at 12:07 PM, Andreas Dilger <adilger at whamcloud.com>
>>wrote:
>>> On 2012-02-02, at 8:54 AM, David Noriega wrote:
>>>> We have two OSSs, each with two quad core AMD Opterons and 8GB of ram
>>>> and two OSTs each(4.4T and 3.5T). Backend storage is a pair of Sun
>>>> StorageTek 2540 connected with 8Gb fiber.
>>>
>>> Running 32-64 threads per OST is the optimum number, based on previous
>>> experience.
>>>
>>>> What about tweaking max_dirty_mb on the client side?
>>>
>>> Probably unrelated.
>>>
>>>> On Wed, Feb 1, 2012 at 6:33 PM, Carlos Thomaz <cthomaz at ddn.com> wrote:
>>>>> David,
>>>>>
>>>>> The oss service threads is a function of your RAM size and CPUs. It's
>>>>> difficult to say what would be a good upper limit without knowing the
>>>>>size
>>>>> of your OSS, # clients, storage back-end and workload. But the good
>>>>>thing
>>>>> you can give a try on the fly via lctl set_param command.
>>>>>
>>>>> Assuming you are running lustre 1.8, here is a good explanation on
>>>>>how to
>>>>> do it:
>>>>> 
>>>>>http://wiki.lustre.org/manual/LustreManual18_HTML/LustreProc.html#5065
>>>>>1
>>>>>263_
>>>>> 87260
>>>>>
>>>>> Some remarks:
>>>>> - reducing the number of OSS threads may impact the performance
>>>>>depending
>>>>> on how is your workload.
>>>>> - unfortunately I guess you will need to try and see what happens. I
>>>>>would
>>>>> go for 128 and analyze the behavior of your OSSs (via log files) and
>>>>>also
>>>>> keeping an eye on your workload. Seems to me that 300 is a bit too
>>>>>high
>>>>> (but again, I don't know what you have on your storage back-end or
>>>>>OSS
>>>>> configuration).
>>>>>
>>>>>
>>>>> I can't tell you much about the lru_size, but as far as I understand
>>>>>the
>>>>> values are dynamic and there's not much to do rather than clear the
>>>>>last
>>>>> recently used queue or disable the lru sizing. I can't help much on
>>>>>this
>>>>> other than pointing you out the explanation for it (see 31.2.11):
>>>>>
>>>>> http://wiki.lustre.org/manual/LustreManual20_HTML/LustreProc.html
>>>>>
>>>>>
>>>>> Regards,
>>>>> Carlos
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Carlos Thomaz | HPC Systems Architect
>>>>> Mobile: +1 (303) 519-0578
>>>>> cthomaz at ddn.com | Skype ID: carlosthomaz
>>>>> DataDirect Networks, Inc.
>>>>> 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921
>>>>> ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless
>>>>> <http://twitter.com/ddn_limitless> | 1.800.TERABYTE
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 2/1/12 2:11 PM, "David Noriega" <tsk133 at my.utsa.edu> wrote:
>>>>>
>>>>>> zone_reclaim_mode is 0 on all clients/servers
>>>>>>
>>>>>> When changing number of service threads or the lru_size, can these
>>>>>>be
>>>>>> done on the fly or do they require a reboot of either client or
>>>>>> server?
>>>>>> For my two OSTs, cat /proc/fs/lustre/ost/OSS/ost_io/threads_started
>>>>>> give about 300(300, 359) so I'm thinking try half of that and see
>>>>>>how
>>>>>> it goes?
>>>>>>
>>>>>> Also checking lru_size, I get different numbers from the clients.
>>>>>>cat
>>>>>> /proc/fs/lustre/ldlm/namespaces/*/lru_size
>>>>>>
>>>>>> Client: MDT0 OST0 OST1 OST2 OST3 MGC
>>>>>> head node: 0 22 22 22 22 400 (only a few users logged in)
>>>>>> busy node: 1 501 504 503 505 400 (Fully loaded with jobs)
>>>>>> samba/nfs server: 4 440070 44370 44348 26282 1600
>>>>>>
>>>>>> So my understanding is the lru_size is set to auto by default thus
>>>>>>the
>>>>>> varying values, but setting it manually is effectively setting a max
>>>>>> value? Also what does it mean to have a lower value(especially in
>>>>>>the
>>>>>> case of the samba/nfs server)?
>>>>>>
>>>>>> On Wed, Feb 1, 2012 at 1:27 PM, Charles Taylor <taylor at hpc.ufl.edu>
>>>>>>wrote:
>>>>>>>
>>>>>>> You may also want to check and, if necessary, limit the lru_size on
>>>>>>> your clients.   I believe there are guidelines in the ops manual.
>>>>>>> We have ~750 clients and limit ours to 600 per OST.   That,
>>>>>>>combined
>>>>>>> with the setting zone_reclaim_mode=0 should make a big difference.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Charlie Taylor
>>>>>>> UF HPC Center
>>>>>>>
>>>>>>>
>>>>>>> On Feb 1, 2012, at 2:04 PM, Carlos Thomaz wrote:
>>>>>>>
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> You may be facing the same issue discussed on previous threads,
>>>>>>>>which
>>>>>>>> is
>>>>>>>> the issue regarding the zone_reclaim_mode.
>>>>>>>>
>>>>>>>> Take a look on the previous thread where myself and Kevin replied
>>>>>>>>to
>>>>>>>> Vijesh Ek.
>>>>>>>>
>>>>>>>> If you don't have access to the previous emails, look at your
>>>>>>>>kernel
>>>>>>>> settings for the zone reclaim:
>>>>>>>>
>>>>>>>> cat /proc/sys/vm/zone_reclaim_mode
>>>>>>>>
>>>>>>>> It should be set to 0.
>>>>>>>>
>>>>>>>> Also, look at the number of Lustre OSS service threads. It may be
>>>>>>>>set
>>>>>>>> to
>>>>>>>> high...
>>>>>>>>
>>>>>>>> Rgds.
>>>>>>>> Carlos.
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Carlos Thomaz | HPC Systems Architect
>>>>>>>> Mobile: +1 (303) 519-0578
>>>>>>>> cthomaz at ddn.com | Skype ID: carlosthomaz
>>>>>>>> DataDirect Networks, Inc.
>>>>>>>> 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921
>>>>>>>> ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless
>>>>>>>> <http://twitter.com/ddn_limitless> | 1.800.TERABYTE
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2/1/12 11:57 AM, "David Noriega" <tsk133 at my.utsa.edu> wrote:
>>>>>>>>
>>>>>>>>> indicates the system was overloaded (too many service threads, or
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Lustre-discuss mailing list
>>>>>>>> Lustre-discuss at lists.lustre.org
>>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>>>>
>>>>>>> Charles A. Taylor, Ph.D.
>>>>>>> Associate Director,
>>>>>>> UF HPC Center
>>>>>>> (352) 392-4036
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> David Noriega
>>>>>> System Administrator
>>>>>> Computational Biology Initiative
>>>>>> High Performance Computing Center
>>>>>> University of Texas at San Antonio
>>>>>> One UTSA Circle
>>>>>> San Antonio, TX 78249
>>>>>> Office: BSE 3.112
>>>>>> Phone: 210-458-7100
>>>>>> http://www.cbi.utsa.edu
>>>>>> _______________________________________________
>>>>>> Lustre-discuss mailing list
>>>>>> Lustre-discuss at lists.lustre.org
>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> David Noriega
>>>> System Administrator
>>>> Computational Biology Initiative
>>>> High Performance Computing Center
>>>> University of Texas at San Antonio
>>>> One UTSA Circle
>>>> San Antonio, TX 78249
>>>> Office: BSE 3.112
>>>> Phone: 210-458-7100
>>>> http://www.cbi.utsa.edu
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>
>>>
>>> Cheers, Andreas
>>> --
>>> Andreas Dilger                       Whamcloud, Inc.
>>> Principal Engineer                   http://www.whamcloud.com/
>>>
>>>
>>>
>>>
>>
>>
>>
>>-- 
>>David Noriega
>>System Administrator
>>Computational Biology Initiative
>>High Performance Computing Center
>>University of Texas at San Antonio
>>One UTSA Circle
>>San Antonio, TX 78249
>>Office: BSE 3.112
>>Phone: 210-458-7100
>>http://www.cbi.utsa.edu
>>_______________________________________________
>>Lustre-discuss mailing list
>>Lustre-discuss at lists.lustre.org
>>http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>_______________________________________________
>Lustre-discuss mailing list
>Lustre-discuss at lists.lustre.org
>http://lists.lustre.org/mailman/listinfo/lustre-discuss