[Lustre-discuss] Frequent OSS Crashes with heavy load

Mon Nov 10 08:42:36 PST 2008

I have already 512(max number) IO thread running. Some of them are of "Dead"
status. Is it safe to draw conclusion that the OSS is oversubscribed? 

Brian J. Murrell 写:

> On Mon, 2008-11-10 at 16:18 +0000, Wang lu wrote:
>> I am also unclear about the top result:
>> top - 00:16:19 up 1 day,  3:58,  1 user,  load average: 22.71, 23.27, 23.74
>> Tasks: 851 total,   2 running, 849 sleeping,   0 stopped,   0 zombie
>> Cpu(s):  0.0% us,  7.0% sy,  0.0% ni, 86.7% id,  0.2% wa,  0.2% hi,  5.9% si
>> Mem:   8307364k total,   894940k used,  7412424k free,   240912k buffers
>> Swap: 16386292k total,        0k used, 16386292k free,    78108k cached
>> 
>> 
>> The CPU and memory are both free, while the load average is quite high. It
is
>> possibile for Lustre to cache more data?
> 
> Caching on the OSS is a coming feature but that doesn't alleviate the
> need of the OST to read data not in cache and data that needs to be
> flushed to disk.  IOW, a cache will not alleviate a problem of
> oversubscribed storage.
> 
> b.
>