[lustre-discuss] Speeding up recovery

Thu Jul 23 05:14:26 PDT 2015

Hi Andreas,

Thanks for the input.

I checked the document and found this topic -
https://build.hpdd.intel.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#dbdoclet.50438199_62545

I have MGT and MDT on separate devices.

So is there anyway to reconfigure the MGT to use the new default and
failover IP without formatting it?
In any case, how do I then tell the MDTs and OSTs that the MGT's default
and failover IPs have changed?

I cant find these in the document.

Thanks and Regards,

Indivar Nair

On Wed, Jul 22, 2015 at 5:25 AM, Dilger, Andreas <andreas.dilger at intel.com>
wrote:

> I believe this is described in the Lustre Manual, but the basic process to
> split a combined MDS+MGS into a separate MGS is to format a new MGS device,
> then copy all the files from CONFIGS on the old combined MDT+MGT device
> into the new MGS. See the manual for full details.
>
> Cheers, Andreas
>
> On Jul 21, 2015, at 01:27, Indivar Nair <indivar.nair at techterra.in<mailto:
> indivar.nair at techterra.in>> wrote:
>
> Hi ...,
>
> Currently, Failover and Recovery takes a very long long time in our setup;
> almost 20 Minutes. We would like to make it as fast as possible.
>
> I have two queries regarding this -
>
> 1.
> ===================================================
> The MGS and MDT are on the same host.
>
> We do however have a passive stand-by server for the MGS/MDT server, which
> only mounts these partitions in case of a failure.
>
> Current Setup
> Server A: MGS+MDT
> Server B: Failover MGS+MDT
>
> I was wondering whether I can now move the MGS or MDT Partition to the
> standby server (so that imperative recovery works properly) -
>
> New Setup
> Server A: MDT & Failover MGS
> Server B: MGS & Failover MDT
>    OR
> Server A: MGS & Failover MDT
> Server B: MDT & Failover MGS
>
> i.e.
> Can I separate the MDT and MGS partitions on to different machines without
> formatting or reinstalling Lustre?
> ===================================================
>
> 2.
> ===================================================
> This storage is used by around 150 Workstations and 150 Compute (Render)
> Nodes.
>
> Out of these 150 workstations, around 30 - 40 are MS Windows. The MS
> Windows clients access the storage through a 2-node Samba Gateway Cluster.
>
> The Gateway Nodes are connected to the storage through a QDR Infiniband
> Network.
>
> We were thinking of adding NFS Service to the Samba Gateway nodes, and
> reconfiguring the Linux clients to connect via this gateway.
>
> This will bring down the direct Lustre Clients to just 2 nodes.
> So, will having only 2 clients improve the failover-recovery time?
> ===================================================
>
> Is there anything else we can do to speed up recovery?
>
> Regards,
>
>
> Indivar Nair
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20150723/ba6d6ae3/attachment-0001.htm>