[Lustre-discuss] Intermittent routing errors
Johann Lombardi
johann at sun.com
Fri Jan 29 10:47:39 PST 2010
On Fri, Jan 29, 2010 at 05:00:10PM +0200, Deon Borman wrote:
> I have a weird problem on one of my OSSs, though I've seen it once on
> the other OSS. Things will be humming along nicely, when suddenly I get
> lots of messages like this:
>
> Jan 29 15:26:16 venus kernel: Lustre:
> 898:0:(socklnd_cb.c:915:ksocknal_launch_packet()) No usable routes to
> 12345-192.168.1.26 at tcp
> Jan 29 15:26:16 venus kernel: Lustre:
> 1090:0:(socklnd_cb.c:915:ksocknal_launch_packet()) No usable routes to
> 12345-192.168.1.26 at tcp
Any errors reported on the router nodes?
> In the 50 odd minutes before I picked it up, it produced over 10 million
> such lines in /var/log/messages.
That's a known problem. In 1.8.1, neterror is printed to the console by
default, but those messages are not rate-limited. This is fixed in 1.8.2,
see bug 20805. In the meantine, you can disable neterror on the console
as follows:
# lctl set_param debug=-neterror
lnet.debug=-neterror
However, this will just avoid flooding the console, but won't address
the router connection problem.
Johann
More information about the lustre-discuss
mailing list