[Lustre-discuss] Intermittent routing errors

Deon Borman deon at blackginger.tv
Mon Feb 1 01:47:46 PST 2010


Johann Lombardi wrote:
> On Fri, Jan 29, 2010 at 05:00:10PM +0200, Deon Borman wrote:
>   
>> I have a weird problem on one of my OSSs, though I've seen it once on 
>> the other OSS. Things will be humming along nicely, when suddenly I get 
>> lots of messages like this:
>>
>> Jan 29 15:26:16 venus kernel: Lustre: 
>> 898:0:(socklnd_cb.c:915:ksocknal_launch_packet()) No usable routes to 
>> 12345-192.168.1.26 at tcp
>> Jan 29 15:26:16 venus kernel: Lustre: 
>> 1090:0:(socklnd_cb.c:915:ksocknal_launch_packet()) No usable routes to 
>> 12345-192.168.1.26 at tcp
>>     
>
> Any errors reported on the router nodes?
>   

There are no router nodes, my LAN network is set up as 192.168.0.0/23. 
The clients and the server see each other on directly, on an IP level.

I did, however, have another failure over the weekend. The client 
machine was pushed beyond RAM and swap, so OOM did it's thing, killed 
something important and the machine died. So there was no graceful 
disconnect. Around the same time my management software logged the 
machine as unavailable, the server started throwing those error messages 
again. So maybe a sudden network disconnect is at the root of all this. 
I'm going to try reproduce this sometime tomorrow when I have more time, 
to see if I can prove/disprove my theory.
>   
>> In the 50 odd minutes before I picked it up, it produced over 10 million 
>> such lines in /var/log/messages.
>>     
>
> That's a known problem. In 1.8.1, neterror is printed to the console by
> default, but those messages are not rate-limited. This is fixed in 1.8.2,
> see bug 20805. In the meantine, you can disable neterror on the console
> as follows:
> # lctl set_param debug=-neterror
> lnet.debug=-neterror
>
> However, this will just avoid flooding the console, but won't address
> the router connection problem.
>   
That's a start, I found the IO to my OS drive was so severe as to 
degrade most other tasks - like ssh-ing into the box.

Thanks
Deon
> Johann
>
>   


-- 
Deon Borman
IT Supervisor
BlackGinger
--




More information about the lustre-discuss mailing list