[Lustre-discuss] Frequent OSS Crashes with heavy load
Wang lu
wanglu at ihep.ac.cn
Mon Nov 10 08:18:30 PST 2008
I am also unclear about the top result:
top - 00:16:19 up 1 day, 3:58, 1 user, load average: 22.71, 23.27, 23.74
Tasks: 851 total, 2 running, 849 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0% us, 7.0% sy, 0.0% ni, 86.7% id, 0.2% wa, 0.2% hi, 5.9% si
Mem: 8307364k total, 894940k used, 7412424k free, 240912k buffers
Swap: 16386292k total, 0k used, 16386292k free, 78108k cached
The CPU and memory are both free, while the load average is quite high. It is
possibile for Lustre to cache more data?
Brian J. Murrell 写:
> On Mon, 2008-11-10 at 15:58 +0000, Wang lu wrote:
>> Thanks, but I am still unclear about:
>>
>> 1.How to limit the OST thread number after I find a optimum number?
>
> It's a module option to the oss module. It should be documented in the
> manual.
>
>> 2.The meaning of /proc/sys/lnet/peers and /proc/sys/lnet/nis?
>
> The meaning of many of the variables in /proc are also documented in the
> manual. If you find any that are not, you can file a ticket in our bz
> requesting they be added.
>
>> For example
>> [root at boss01 ~]# cat /proc/sys/lnet/peers
>> nid refs state max rtr min tx min queue
>> 192.168.52.39 at tcp 6 ~rtr 8 8 8 3 -19 1458536
>>
>> [root at boss01 ~]# cat /proc/sys/lnet/nis
>> nid refs peer max tx min
>> 0 at lo 2 0 0 0 0
>> 192.168.50.33 at tcp 137 8 256 256 -424
>
> I don't know the details of either of these off-hand. Probably one of
> our LNET experts might be able to provide more information.
>
> b.
>
More information about the lustre-discuss
mailing list