[Lustre-discuss] EAGAIN / ECONNRESET messages

Alexey Lyashkov Alexey.Lyashkov at Sun.COM
Tue Dec 1 21:02:57 PST 2009


Hi

This likely to tcp stack tuning.
Possible OSS node not have enough free sockets for connect.

On Tue, 2009-11-24 at 09:35 +0100, Heiko Schröter wrote:
> Hello,
> 
> on three of eight OSTs i can see sporadic messages like these:
> 
> sadosrd21
> Nov 24 09:11:52 sadosrd21 LustreError: 5518:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.133
> Nov 24 09:12:01 sadosrd21 LustreError: 5516:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.19
> sadosrd24
> Nov 21 01:42:13 sadosrd24 LustreError: 9097:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.111
> Nov 21 01:42:13 sadosrd24 LustreError: 9098:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.114
> Nov 22 04:01:59 sadosrd24 LustreError: 9096:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.116
> Nov 23 01:42:16 sadosrd24 LustreError: 9099:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.34
> Nov 23 01:42:27 sadosrd24 LustreError: 9096:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.16.34
> Nov 23 01:42:59 sadosrd24 LustreError: 9096:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.16.116
> sadosrd25
> Nov 22 04:02:06 sadosrd25 LustreError: 5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.19
> Nov 23 04:00:53 sadosrd25 LustreError: 5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.114
> Nov 23 04:01:01 sadosrd25 LustreError: 5049:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.115
> Nov 23 04:01:02 sadosrd25 LustreError: 5048:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.109
> Nov 23 09:12:57 sadosrd25 LustreError: 5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.111
> Nov 24 01:41:40 sadosrd25 LustreError: 5048:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.110
> Nov 24 01:42:57 sadosrd25 LustreError: 5051:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.111
> Nov 24 01:43:03 sadosrd25 LustreError: 5049:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.16.110
> Nov 24 01:43:08 sadosrd25 LustreError: 5051:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.100
> Nov 24 01:43:11 sadosrd25 LustreError: 5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.122
> 
> Error Number:
> /usr/include/asm-generic/errno-base.h:#define   EAGAIN          11      /* Try again */
> /usr/include/asm-generic/errno.h:#define        ECONNRESET      104     /* Connection reset by peer */
> 
> They seem to be related to heavy network traffic to and from this OST.
> Network driver e1000.
> 







More information about the lustre-discuss mailing list