[Lustre-discuss] Frequent OSS Crashes with heavy load

Wang lu wanglu at ihep.ac.cn
Mon Nov 10 07:58:10 PST 2008


Thanks, but I am still unclear about: 

1.How to limit the OST thread number after I find a optimum number?
2.The meaning of /proc/sys/lnet/peers and /proc/sys/lnet/nis?
For example
[root at boss01 ~]# cat /proc/sys/lnet/peers
nid                      refs state   max   rtr   min    tx   min queue
192.168.52.39 at tcp           6  ~rtr     8     8     8     3   -19 1458536

[root at boss01 ~]# cat /proc/sys/lnet/nis
nid                      refs peer   max    tx   min
0 at lo                        2    0     0     0     0
192.168.50.33 at tcp         137    8   256   256  -424


Brian J. Murrell 写:

> On Mon, 2008-11-10 at 14:49 +0000, Wang lu wrote:
>> Thanks, Brian,
>>    During a "crash", I can neither SSH to the OSS server, nor
>> start a new console on the machine directly. A "df" uses over 10 sec.
> 
> Yeah, sounds like the OSS is quite "backed up".
> 
>>    Our system has 3 server nodes. 1 server for MDS, 2 servers for 2 OSSs.
And
>> each OSS has 2 disk arrays attached. The Total space is 57TB.
> 
> Ahhh.  OK.  Your description made it sound like you were running all of
> those on a single node and the reality is that Lustre doesn't do
> anything magic.  If you only use a single node, you likely won't see any
> better performance than say, just NFS.  But I digress.
> 
>>    The problem may be caused by oversubscribed. Since the %util and average
>> load are both high. However, I do not know 1.How to estimate the optimum
number
>> of OST thread? Do you have any suggestion? 
> 
> Use our iokit.
> 
>> 2.What is the relationship between OST thread number and the number of
Lustre
>> client nodes?
> 
> Nothing.  The relationship is the point of diminishing returns on
> driving your storage as you add more threads.  Most storage can benefit
> from having multiple threads from a single machine driving it -- to a
> point of saturation.  There is no point in driving the storage beyond
> that point of saturation.  The iokit will test your storage, throwing
> more and more threads at it.  When you look at the output you will find
> a maximum number of threads beyond which you get no more increase in
> performance.  That number is your optimum OST threads number.
> 
> Certainly you can achieve this without the iokit by just playing with
> the number of ost threads, adjusting up and down (think about doing a
> binary search for example) until you find your sweet spot.  This method
> is of course more time consuming.
> 
> b.
> 



More information about the lustre-discuss mailing list