[Lustre-discuss] Network failover

Brian J. Murrell Brian.Murrell at Sun.COM
Wed Jul 2 10:20:11 PDT 2008


On Wed, 2008-07-02 at 19:08 +0200, Hans-Juergen Schnitzer wrote:
> In my current configuration, when I do a failover from one OSS node
> to another one, the client reconnects after some time.

Good.

> However, when I simply unplug the IB connector on the
> OSS, the client hangs, waiting for the connection to come back.

How long did you give it?  There should be no functional difference
between failures.  Simply if a client times out trying to reach an OST
on a given OSS, it tries the failover partner.  Certainly a network
failure would qualify as the kind of failure that would trigger that
event.

Did you actually umount the OSTs on the server you pulled the IB
connector on and mount them on the failover OSS?  Having the failover
OSS mount the failed resources (and ONLY after the failed node has
unmounted them!!) is a prerequisite for the client to actually perform a
failover.

> One use-case is that I would like to switch to ethernet when I
> shutdown IB for maintenance for example. How can I do that?

If the same OSSes will have both the IB and TCP NIDs for the target,
simply shutting down the IB interface/network should cause a properly
configured client to failover to the TCP LND.

The Operations Manual should give pretty good coverage to configuring
multiple LNDs.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080702/408fa0e2/attachment.pgp>


More information about the lustre-discuss mailing list