[Lustre-discuss] EAGAIN / ECONNRESET messages

Heiko Schröter schroete at iup.physik.uni-bremen.de
Tue Nov 24 00:35:26 PST 2009


Hello,

on three of eight OSTs i can see sporadic messages like these:

sadosrd21
Nov 24 09:11:52 sadosrd21 LustreError: 5518:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.133
Nov 24 09:12:01 sadosrd21 LustreError: 5516:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.19
sadosrd24
Nov 21 01:42:13 sadosrd24 LustreError: 9097:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.111
Nov 21 01:42:13 sadosrd24 LustreError: 9098:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.114
Nov 22 04:01:59 sadosrd24 LustreError: 9096:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.116
Nov 23 01:42:16 sadosrd24 LustreError: 9099:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.34
Nov 23 01:42:27 sadosrd24 LustreError: 9096:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.16.34
Nov 23 01:42:59 sadosrd24 LustreError: 9096:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.16.116
sadosrd25
Nov 22 04:02:06 sadosrd25 LustreError: 5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.19
Nov 23 04:00:53 sadosrd25 LustreError: 5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.114
Nov 23 04:01:01 sadosrd25 LustreError: 5049:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.115
Nov 23 04:01:02 sadosrd25 LustreError: 5048:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.109
Nov 23 09:12:57 sadosrd25 LustreError: 5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.111
Nov 24 01:41:40 sadosrd25 LustreError: 5048:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.110
Nov 24 01:42:57 sadosrd25 LustreError: 5051:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.111
Nov 24 01:43:03 sadosrd25 LustreError: 5049:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.16.110
Nov 24 01:43:08 sadosrd25 LustreError: 5051:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.100
Nov 24 01:43:11 sadosrd25 LustreError: 5050:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -11 reading HELLO from 192.168.16.122

Error Number:
/usr/include/asm-generic/errno-base.h:#define   EAGAIN          11      /* Try again */
/usr/include/asm-generic/errno.h:#define        ECONNRESET      104     /* Connection reset by peer */

They seem to be related to heavy network traffic to and from this OST.
Network driver e1000.

lustre-1.6.6
vanilla 2.6.22.19

What triggers such messages ?
Anything to worry about ?

Thanks and Regards
Heiko



Network Adapter Statistics of the above Raids.
sadosrd21 ~ # ethtool -S eth0                                     
NIC statistics:                                                   
     rx_packets: 3476732178                                       
     tx_packets: 8161698729                                       
     rx_bytes: 1261677735249                                      
     tx_bytes: 11684960617899                                     
     rx_broadcast: 96324977                                       
     tx_broadcast: 31080                                          
     rx_multicast: 885                                            
     tx_multicast: 12                                             
     rx_errors: 0                                                 
     tx_errors: 0                                                 
     tx_dropped: 0                                                
     multicast: 885                                               
     collisions: 0                                                
     rx_length_errors: 0                                          
     rx_over_errors: 0                                            
     rx_crc_errors: 0                                             
     rx_frame_errors: 0                                           
     rx_no_buffer_count: 0                                        
     rx_missed_errors: 112425
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_window_errors: 0
     tx_abort_late_coll: 0
     tx_deferred_ok: 485691240
     tx_single_coll_ok: 0
     tx_multi_coll_ok: 0
     tx_timeout_count: 0
     tx_restart_queue: 202994789
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     tx_tcp_seg_good: 2220028952
     tx_tcp_seg_failed: 0
     rx_flow_control_xon: 926991076
     rx_flow_control_xoff: 2476536244
     tx_flow_control_xon: 3754
     tx_flow_control_xoff: 6876
     rx_long_byte_count: 1261677735249
     rx_csum_offload_good: 3415421552
     rx_csum_offload_errors: 1134
     rx_header_split: 0
     alloc_rx_buff_failed: 0
     tx_smbus: 0
     rx_smbus: 53162812
     dropped_smbus: 0

sadosrd24 ~ # ethtool -S eth0                                  
NIC statistics:                                                
     rx_packets: 4090343679                                    
     tx_packets: 2636690225                                    
     rx_bytes: 5479498759229                                   
     tx_bytes: 2039673228907                                   
     rx_broadcast: 32078587                                    
     tx_broadcast: 28901                                       
     rx_multicast: 316                                         
     tx_multicast: 6                                           
     rx_errors: 0                                              
     tx_errors: 0                                              
     tx_dropped: 0                                             
     multicast: 316                                            
     collisions: 0                                             
     rx_length_errors: 0                                       
     rx_over_errors: 0                                         
     rx_crc_errors: 0                                          
     rx_frame_errors: 0                                        
     rx_no_buffer_count: 11278                                 
     rx_missed_errors: 78171
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_window_errors: 0
     tx_abort_late_coll: 0
     tx_deferred_ok: 194098104
     tx_single_coll_ok: 0
     tx_multi_coll_ok: 0
     tx_timeout_count: 0
     tx_restart_queue: 68502186
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     tx_tcp_seg_good: 410577015
     tx_tcp_seg_failed: 0
     rx_flow_control_xon: 234761468
     rx_flow_control_xoff: 1632413652
     tx_flow_control_xon: 1516
     tx_flow_control_xoff: 2889
     rx_long_byte_count: 5479498759229
     rx_csum_offload_good: 4067175471
     rx_csum_offload_errors: 0
     rx_header_split: 0
     alloc_rx_buff_failed: 0
     tx_smbus: 0
     rx_smbus: 20807887
     dropped_smbus: 0

sadosrd25 ~ # ethtool -S eth0                                                 
NIC statistics:                                                               
     rx_packets: 4305347487                                                   
     tx_packets: 3031165604                                                   
     rx_bytes: 5797498509449                                                  
     tx_bytes: 2043989105691                                                  
     rx_broadcast: 37618726                                                   
     tx_broadcast: 28310                                                      
     rx_multicast: 386                                                        
     tx_multicast: 6                                                          
     rx_errors: 0                                                             
     tx_errors: 0                                                             
     tx_dropped: 0                                                            
     multicast: 386                                                           
     collisions: 0                                                            
     rx_length_errors: 0                                                      
     rx_over_errors: 0                                                        
     rx_crc_errors: 0                                                         
     rx_frame_errors: 0                                                       
     rx_no_buffer_count: 4738                                                 
     rx_missed_errors: 223116
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_window_errors: 0
     tx_abort_late_coll: 0
     tx_deferred_ok: 156915562
     tx_single_coll_ok: 0
     tx_multi_coll_ok: 0
     tx_timeout_count: 0
     tx_restart_queue: 50086469
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     tx_tcp_seg_good: 396787000
     tx_tcp_seg_failed: 0
     rx_flow_control_xon: 184756690
     rx_flow_control_xoff: 1346260879
     tx_flow_control_xon: 7451
     tx_flow_control_xoff: 13175
     rx_long_byte_count: 5797498509449
     rx_csum_offload_good: 4277898711
     rx_csum_offload_errors: 0
     rx_header_split: 0
     alloc_rx_buff_failed: 0
     tx_smbus: 0
     rx_smbus: 24585106
     dropped_smbus: 0



More information about the lustre-discuss mailing list