[Lustre-discuss] strange slow pararell writes

Papp Tamás tompos at martos.bme.hu
Mon Feb 18 13:39:18 PST 2008


Dear All,

I have some strange problem, now I'm at the point, I have no idea, 
what's happening.
The cluster has 2 meta servers (meta1 and 2) and 6 nodes (node1-6).
The meta's have CentOS 5, nodes have CentOS 4.
Node1,5,6 are 2.6.9-55.0.9.EL_lustre.1.6.4.1smp, the others are 
2.6.9-42.0.10.EL_lustre-1.6.0.1custom-drbd.
There are drbd peers, like node1-2 and so on.
Nodes have 8 SATA disks on Adaptec 2610S and 2620S RAID adapter, and 3 
NIC's (main network, lnet, drbd).


There are the symptoms:

Paralell read is OK, fast and quiet. Single write is OK.

Paralell writes with few (for example 3-4) clients is slow, above that 
it's stucked.
The load on one or two nodes is high, and growing, the kernel is in 
io-wait. Usually this two nodes are node4 and node3 (with file stiping), 
and node4 has load for example 30-40-50, than node3 has approximately 
half of it.

The problem is, this was OK for half year ago.

Do you have any idea or any tip?


Thank,

tamas



More information about the lustre-discuss mailing list