[Lustre-discuss] "up" a router that is marked "down"

Michael Kluge Michael.Kluge at tu-dresden.de
Tue Jan 25 06:52:22 PST 2011


Jason, Michael,

thanks y lot for your replies. I pinged everone from all directions but
the router is still marked "down" on the client. I even removed and
re-added the router entry via lctl --net tcp1 del_route xyz at o2ib and
lctl --net tcp1 add_route xyz at o2ib . No luck. So I think I'll wait for
the next maintenance window. Oh, and I forgot to mention that the
servers run a 1.6.7.2, the router as well and the clients 1.8.5. Works
good so far. 


Thanks, Michael


Am Dienstag, den 25.01.2011, 15:12 +0100 schrieb Temple Jason: 
> I've found that even with the Protocal Error, it still works.
> 
> -Jason
> 
> -----Original Message-----
> From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Michael Shuey
> Sent: martedì, 25. gennaio 2011 14:45
> To: Michael Kluge
> Cc: Lustre Diskussionsliste
> Subject: Re: [Lustre-discuss] "up" a router that is marked "down"
> 
> You'll want to add the "dead_router_check_interval" lnet module
> parameter as soon as you are able.  As near as I can tell, without
> that there's no automatic check to make sure the router is alive.
> 
> I've had some success in getting machines to recognize that a router
> is alive again by doing an lctl ping of their side of a router (e.g.,
> on a tcp0 client, `lctl ping <routerIP>@tcp0`, then `lctl ping
> <routerIP>@o2ib0` from an o2ib0 client).  If you have a server/client
> version mismatch, where lctl ping returns a protocol error, you may be
> out of luck.
> 
> --
> Mike Shuey
> 
> 
> 
> On Tue, Jan 25, 2011 at 8:38 AM, Michael Kluge
> <Michael.Kluge at tu-dresden.de> wrote:
> > Hi list,
> >
> > if a Lustre router is down, comes back to life and the servers do not
> > actively test the routers periodically: is it possible to mark a Lustre
> > router as "up"? Or to tell the servers to ping the router?
> >
> > Or can I enable the "router pinger" in a live system without unloading
> > and loading the Lustre kernel modules?
> >
> >
> > Regards, Michael
> >
> > --
> >
> > Michael Kluge, M.Sc.
> >
> > Technische Universität Dresden
> > Center for Information Services and
> > High Performance Computing (ZIH)
> > D-01062 Dresden
> > Germany
> >
> > Contact:
> > Willersbau, Room A 208
> > Phone:  (+49) 351 463-34217
> > Fax:    (+49) 351 463-37773
> > e-mail: michael.kluge at tu-dresden.de
> > WWW:    http://www.tu-dresden.de/zih
> >
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >
> >
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 

-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:    (+49) 351 463-37773
e-mail: michael.kluge at tu-dresden.de
WWW:    http://www.tu-dresden.de/zih
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5973 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110125/332521c1/attachment.bin>


More information about the lustre-discuss mailing list