[Lustre-discuss] "up" a router that is marked "down"
shuey at purdue.edu
Tue Jan 25 05:45:13 PST 2011
You'll want to add the "dead_router_check_interval" lnet module
parameter as soon as you are able. As near as I can tell, without
that there's no automatic check to make sure the router is alive.
I've had some success in getting machines to recognize that a router
is alive again by doing an lctl ping of their side of a router (e.g.,
on a tcp0 client, `lctl ping <routerIP>@tcp0`, then `lctl ping
<routerIP>@o2ib0` from an o2ib0 client). If you have a server/client
version mismatch, where lctl ping returns a protocol error, you may be
out of luck.
On Tue, Jan 25, 2011 at 8:38 AM, Michael Kluge
<Michael.Kluge at tu-dresden.de> wrote:
> Hi list,
> if a Lustre router is down, comes back to life and the servers do not
> actively test the routers periodically: is it possible to mark a Lustre
> router as "up"? Or to tell the servers to ping the router?
> Or can I enable the "router pinger" in a live system without unloading
> and loading the Lustre kernel modules?
> Regards, Michael
> Michael Kluge, M.Sc.
> Technische Universität Dresden
> Center for Information Services and
> High Performance Computing (ZIH)
> D-01062 Dresden
> Willersbau, Room A 208
> Phone: (+49) 351 463-34217
> Fax: (+49) 351 463-37773
> e-mail: michael.kluge at tu-dresden.de
> WWW: http://www.tu-dresden.de/zih
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
More information about the lustre-discuss