[Lustre-discuss] Frequent OSS Crashes with heavy load
Wang lu
wanglu at ihep.ac.cn
Mon Nov 10 07:58:10 PST 2008
Thanks, but I am still unclear about:
1.How to limit the OST thread number after I find a optimum number?
2.The meaning of /proc/sys/lnet/peers and /proc/sys/lnet/nis?
For example
[root at boss01 ~]# cat /proc/sys/lnet/peers
nid refs state max rtr min tx min queue
192.168.52.39 at tcp 6 ~rtr 8 8 8 3 -19 1458536
[root at boss01 ~]# cat /proc/sys/lnet/nis
nid refs peer max tx min
0 at lo 2 0 0 0 0
192.168.50.33 at tcp 137 8 256 256 -424
Brian J. Murrell 写:
> On Mon, 2008-11-10 at 14:49 +0000, Wang lu wrote:
>> Thanks, Brian,
>> During a "crash", I can neither SSH to the OSS server, nor
>> start a new console on the machine directly. A "df" uses over 10 sec.
>
> Yeah, sounds like the OSS is quite "backed up".
>
>> Our system has 3 server nodes. 1 server for MDS, 2 servers for 2 OSSs.
And
>> each OSS has 2 disk arrays attached. The Total space is 57TB.
>
> Ahhh. OK. Your description made it sound like you were running all of
> those on a single node and the reality is that Lustre doesn't do
> anything magic. If you only use a single node, you likely won't see any
> better performance than say, just NFS. But I digress.
>
>> The problem may be caused by oversubscribed. Since the %util and average
>> load are both high. However, I do not know 1.How to estimate the optimum
number
>> of OST thread? Do you have any suggestion?
>
> Use our iokit.
>
>> 2.What is the relationship between OST thread number and the number of
Lustre
>> client nodes?
>
> Nothing. The relationship is the point of diminishing returns on
> driving your storage as you add more threads. Most storage can benefit
> from having multiple threads from a single machine driving it -- to a
> point of saturation. There is no point in driving the storage beyond
> that point of saturation. The iokit will test your storage, throwing
> more and more threads at it. When you look at the output you will find
> a maximum number of threads beyond which you get no more increase in
> performance. That number is your optimum OST threads number.
>
> Certainly you can achieve this without the iokit by just playing with
> the number of ost threads, adjusting up and down (think about doing a
> binary search for example) until you find your sweet spot. This method
> is of course more time consuming.
>
> b.
>
More information about the lustre-discuss
mailing list