[Lustre-discuss] Thread might be hung, Heavy IO Load messages

Wed Feb 1 11:27:51 PST 2012

You may also want to check and, if necessary, limit the lru_size on your clients.   I believe there are guidelines in the ops manual.      We have ~750 clients and limit ours to 600 per OST.   That, combined with the setting zone_reclaim_mode=0 should make a big difference.   

Regards,

Charlie Taylor
UF HPC Center

On Feb 1, 2012, at 2:04 PM, Carlos Thomaz wrote:

> Hi David,
> 
> You may be facing the same issue discussed on previous threads, which is
> the issue regarding the zone_reclaim_mode.
> 
> Take a look on the previous thread where myself and Kevin replied to
> Vijesh Ek.
> 
> If you don't have access to the previous emails, look at your kernel
> settings for the zone reclaim:
> 
> cat /proc/sys/vm/zone_reclaim_mode
> 
> It should be set to 0.
> 
> Also, look at the number of Lustre OSS service threads. It may be set to
> high...
> 
> Rgds.
> Carlos.
> 
> 
> --
> Carlos Thomaz | HPC Systems Architect
> Mobile: +1 (303) 519-0578
> cthomaz at ddn.com | Skype ID: carlosthomaz
> DataDirect Networks, Inc.
> 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921
> ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless
> <http://twitter.com/ddn_limitless> | 1.800.TERABYTE
> 
> 
> 
> 
> 
> On 2/1/12 11:57 AM, "David Noriega" <tsk133 at my.utsa.edu> wrote:
> 
>> indicates the system was overloaded (too many service threads, or
>> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Charles A. Taylor, Ph.D.
Associate Director,
UF HPC Center
(352) 392-4036