[Lustre-discuss] Intermittent routing errors

Johann Lombardi johann at sun.com
Fri Jan 29 10:47:39 PST 2010


On Fri, Jan 29, 2010 at 05:00:10PM +0200, Deon Borman wrote:
> I have a weird problem on one of my OSSs, though I've seen it once on 
> the other OSS. Things will be humming along nicely, when suddenly I get 
> lots of messages like this:
> 
> Jan 29 15:26:16 venus kernel: Lustre: 
> 898:0:(socklnd_cb.c:915:ksocknal_launch_packet()) No usable routes to 
> 12345-192.168.1.26 at tcp
> Jan 29 15:26:16 venus kernel: Lustre: 
> 1090:0:(socklnd_cb.c:915:ksocknal_launch_packet()) No usable routes to 
> 12345-192.168.1.26 at tcp

Any errors reported on the router nodes?

> In the 50 odd minutes before I picked it up, it produced over 10 million 
> such lines in /var/log/messages.

That's a known problem. In 1.8.1, neterror is printed to the console by
default, but those messages are not rate-limited. This is fixed in 1.8.2,
see bug 20805. In the meantine, you can disable neterror on the console
as follows:
# lctl set_param debug=-neterror
lnet.debug=-neterror

However, this will just avoid flooding the console, but won't address
the router connection problem.

Johann



More information about the lustre-discuss mailing list