[Lustre-discuss] Frequent OSS Crashes with heavy load

Wang lu wanglu at ihep.ac.cn
Mon Nov 10 06:49:20 PST 2008


Thanks, Brian,
   During a "crash", I can neither SSH to the OSS server, nor
start a new console on the machine directly. A "df" uses over 10 sec.

   Our system has 3 server nodes. 1 server for MDS, 2 servers for 2 OSSs. And
each OSS has 2 disk arrays attached. The Total space is 57TB.

   The problem may be caused by oversubscribed. Since the %util and average
load are both high. However, I do not know 1.How to estimate the optimum number
of OST thread? Do you have any suggestion? 
2.What is the relationship between OST thread number and the number of Lustre
client nodes? if max OST thread number is X, then max Lustre client number is X/
8?(the default connections of a  peer is 8).
 




 


Brian J. Murrell 写:

> On Mon, 2008-11-10 at 14:58 +0800, wanglu wrote:
>> ? 
>> Dear list, 
>>  
>>      Our Lustre system crashes
> 
> I don't see any evidence of a "crash" in your posting here.  Can you
> define what you mean by "crash"?
> 
>> The configuration of our system
>> OS:Linux 2.6.9-67.0.7.EL_lustre.1.6.5smp
>> MDS:1
>> OSS:2 with 10Gbit/s NIC, each attached with 2 disk arrays directly. 
>> Client: 50 nodes( 8 core server), each has 1Gbit/s NIC
> 
> So your entire Lustre server infrastructure is a single node with all of
> the MDS, MGS and OSS (2x OSTs) on it?  If yes, can I ask why?  Lustre is
> likely not going to perform very well in such a configuration.
> 
> Is your storage oversubscribed?  Did you benchmark your storage system
> with our iokit to find out the optimum number of OST threads you should
> be running?
>  
>> My questions is:
>> 1.What is the signal of the Lustre overload?
> 
> I'm not sure I'm understanding this question.
> 
>> 2. Can Lustre reject too many connections before it is going to
>> crash?  
> 
> Properly tuned, Lustre will not "crash" due to load, but will manage it.
> As long as your OSS is properly tuned for your storage capabilities, you
> can throw as many client loads at it as you want.  Each load will just
> get it's appropriate share of the backend resources.  As you continue to
> add more clients loads, each load will just get a smaller portion of the
> total resources.
> 
> b.
> 
> 



More information about the lustre-discuss mailing list