[lustre-discuss] Speeding up recovery

Indivar Nair indivar.nair at techterra.in
Tue Jul 21 01:27:02 PDT 2015


Hi ...,

Currently, Failover and Recovery takes a very long long time in our setup;
almost 20 Minutes. We would like to make it as fast as possible.

I have two queries regarding this -

1.
===================================================
The MGS and MDT are on the same host.

We do however have a passive stand-by server for the MGS/MDT server, which
only mounts these partitions in case of a failure.

*Current Setup*
Server A: MGS+MDT
Server B: Failover MGS+MDT

I was wondering whether I can now move the MGS or MDT Partition to the
standby server (so that imperative recovery works properly) -

*New Setup*
Server A: MDT & *Failover MGS*
Server B: *MGS* & Failover MDT

*OR*
Server A: *MGS* & Failover MDT
Server B: MDT & *Failover MGS*

i.e.

*Can I separate the MDT and MGS partitions on to different machines without
formatting or reinstalling Lustre?*
===================================================

2.
===================================================
This storage is used by around 150 Workstations and 150 Compute (Render)
Nodes.

Out of these 150 workstations, around 30 - 40 are MS Windows. The MS
Windows clients access the storage through a 2-node Samba Gateway Cluster.

The Gateway Nodes are connected to the storage through a QDR Infiniband
Network.

We were thinking of adding NFS Service to the Samba Gateway nodes, and
reconfiguring the Linux clients to connect via this gateway.

This will bring down the direct Lustre Clients to just 2 nodes.
*So, will having only 2 clients improve the failover-recovery time?*
===================================================

Is there anything else we can do to speed up recovery?

Regards,


Indivar Nair
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20150721/a879f86a/attachment.htm>


More information about the lustre-discuss mailing list