[Lustre-discuss] "up" a router that is marked "down"
jtemple at cscs.ch
Tue Jan 25 06:12:03 PST 2011
I've found that even with the Protocal Error, it still works.
From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Michael Shuey
Sent: martedì, 25. gennaio 2011 14:45
To: Michael Kluge
Cc: Lustre Diskussionsliste
Subject: Re: [Lustre-discuss] "up" a router that is marked "down"
You'll want to add the "dead_router_check_interval" lnet module
parameter as soon as you are able. As near as I can tell, without
that there's no automatic check to make sure the router is alive.
I've had some success in getting machines to recognize that a router
is alive again by doing an lctl ping of their side of a router (e.g.,
on a tcp0 client, `lctl ping <routerIP>@tcp0`, then `lctl ping
<routerIP>@o2ib0` from an o2ib0 client). If you have a server/client
version mismatch, where lctl ping returns a protocol error, you may be
out of luck.
On Tue, Jan 25, 2011 at 8:38 AM, Michael Kluge
<Michael.Kluge at tu-dresden.de> wrote:
> Hi list,
> if a Lustre router is down, comes back to life and the servers do not
> actively test the routers periodically: is it possible to mark a Lustre
> router as "up"? Or to tell the servers to ping the router?
> Or can I enable the "router pinger" in a live system without unloading
> and loading the Lustre kernel modules?
> Regards, Michael
> Michael Kluge, M.Sc.
> Technische Universität Dresden
> Center for Information Services and
> High Performance Computing (ZIH)
> D-01062 Dresden
> Willersbau, Room A 208
> Phone: (+49) 351 463-34217
> Fax: (+49) 351 463-37773
> e-mail: michael.kluge at tu-dresden.de
> WWW: http://www.tu-dresden.de/zih
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
More information about the lustre-discuss