[Lustre-discuss] MGS Nids

Sun May 23 07:35:14 PDT 2010

>> In the new setup I have 4 machines, two MDT's and two OST's
>> We want to use keepalived as a failover mechanism between the
>> two MDT's.  To keep the MDT's in sync, I'm using a DRBD disk
>> between the two. Keepalive uses a VIP in a active/passive
>> state. In a failover situation the VIP gets transferred to
>> the passive one.

> Lustre uses stateful client/server connection. You don't need
> to - and cannot - use a virtual ip. The lustre protocol
> already takes care of reconnection & recovery.

Sure, for access to the server purposes, but there is a good way
to achieve *network* routing failover using something like VIPs
(in an IP-only setup).

What you do is simple:

 #1 On a server with multiple interfaces, for example 192.168.1.1
    and 192.168.2.1, add a 'dummy0' interface, e.g. 192.168.42.1.

 #2 Run OSPF on the server, advertising a *host route* to the
    'dummy0' interface, 192.168.42.1/32 (as well as the real
    interfaces of course).

 #3 Bind the Lustre daemons to the 'dummy0' interface address. As
    long as there is a route to it, all network reconfigurations
    will be transparent to Lustre.

Of course one can have for MGSes, MDSes, and OSSes, two or more
servers with different "'dummy0'" addresses to use as different
NIDs, to let Lustre handle *server* (as opposed to network)
failures.  The price is a host route per server, but that usually
is quite insignificant.

The only problem with the setup above is that #3 "Bind the Lustre
daemons to the 'dummy0' interface address" seems impossible to
achieve (not tried directly, so I am told), and while clients
packets sent to the 'dummy0' address reach the server in a fully
resilient way, reply packets often/usually have as source address
one of the addresses of the real interfaces, instead of that of
the 'dummy0' interface, and this of course breaks the scheme.

Note that the scheme above is fairly valuable, as it gives full
*dynamic* network resilience.

Is there a simple way to get the Lustre daemons in the kernel to
bind to a specific address instead of [0.0.0.0], like most server
daemons in UNIX/Linux can?

[ ... ]