[Lustre-discuss] Frequent OSS Crashes with heavy load

Mon Nov 10 06:20:30 PST 2008

On Mon, 2008-11-10 at 14:58 +0800, wanglu wrote:
>  
> Dear list, 
>  
>      Our Lustre system crashes

I don't see any evidence of a "crash" in your posting here.  Can you
define what you mean by "crash"?

> The configuration of our system
> OS:Linux 2.6.9-67.0.7.EL_lustre.1.6.5smp
> MDS:1
> OSS:2 with 10Gbit/s NIC, each attached with 2 disk arrays directly. 
> Client: 50 nodes( 8 core server), each has 1Gbit/s NIC

So your entire Lustre server infrastructure is a single node with all of
the MDS, MGS and OSS (2x OSTs) on it?  If yes, can I ask why?  Lustre is
likely not going to perform very well in such a configuration.

Is your storage oversubscribed?  Did you benchmark your storage system
with our iokit to find out the optimum number of OST threads you should
be running?

> My questions is:
> 1.What is the signal of the Lustre overload?

I'm not sure I'm understanding this question.

> 2. Can Lustre reject too many connections before it is going to
> crash?  

Properly tuned, Lustre will not "crash" due to load, but will manage it.
As long as your OSS is properly tuned for your storage capabilities, you
can throw as many client loads at it as you want.  Each load will just
get it's appropriate share of the backend resources.  As you continue to
add more clients loads, each load will just get a smaller portion of the
total resources.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20081110/3807b261/attachment.pgp>