[Lustre-discuss] Network problems

Fri Oct 29 04:42:17 PDT 2010

Hey,

thanks for your reply!

> Are you seeing any "slow" messages on the servers?  There are lots of 
> reasons server threads could be slow.  If /proc/sys/vm/zone_reclaim_mode
> is 1, try setting it to 0.

I don't understand what you mean by "slow" messages? The servers are
basically idle... I don't have a /proc/sys/vm/zone_reclaim_mode (running
CentOS5 kernel version 2.6.18-164.11.1.el5_lustre.1.8.3.

> You might want to try the patch in Bug 23826 which I found useful in 
> tracking how long the server thread was processing the request, rather 
> than just the IO phase.

Ok thanks, I will have a look into this as soon as I have time to
download, patch and compile lustre (using binary packets for now).

> Ok, I've seen this before.  This is an interesting driver.  Basically, 
> under heavy load the packets are coming in as fast as the interrupt 
> handler can pull them off the RX queue.  Rather than having the driver 
> bail, you probably need something like this in modprobe.conf:
> options forcedeth max_interrupt_work=100
> But this message means the driver is dropping packets, not the switch.  
> (Note that NAPI was developed for situations like this).
> 
> You probably also want to increase the tx/rx ring sizes on the driver, 
> unless you've already done that.

I upgraded the driver on my 3 lustre servers, so that I can set the ring
sizes (wasn't possible before, see
http://www.centos.org/modules/newbb/print.php?form=1&topic_id=20835&forum=40&order=ASC&start=0). I am now using the following options:

	options forcedeth max_interrupt_work=100 rx_ring_size=1024 tx_ring_size=1024 rx_flow_control=1 tx_flow_control=1

Let's wait and hope. 

But I've also seen nodes disconnecting from the switch, and these nodes
do not have the forcedepth driver. Under high load, the switch starts to
show RX errors on *all* interfaces. If this is the case, even the nodes
with broadcom NICs disconnect (Tigon3 BCM95780).

Additionally, we recently upgraded our cluster (from Rocks 4.2.1 with
Lustre 1.6.3 to Rocks 5.3 with Lustre 1.8.3). I saw similar behavior on
our old cluster. I guess it's the switch then, but I would like to be
sure. Swapping it on a hunch is with 2000 Euros an expensive test...

Thanks!
Arne