[Lustre-discuss] [HPDD-discuss] targets start order in Lustre 2.4.3

Read, Robert robert.read at intel.com
Fri May 23 10:15:18 PDT 2014

The strict ordering of MGT, MDT, OST is only a hard requirement for the very first start of the file system, because the targets need to compete the initial registration with the MGT before they can be used.  Subsequent cold restarts of the filesystem don’t necessarily need to start any particular order, but it’s still not a bad idea to start MGT first since all the servers will attempt to connect to it when they start, though they should start without it eventually.


On May 23, 2014, at 09:16 , White, Cliff <cliff.white at intel.com> wrote:

> In a failover situation, any target can be stopped and restarted without
>  impact on other nodes. The startup order in the manual is for a cold
> startup/full shutdown situation, and does not apply to a running
> filesystem and failover.
> You should not have the ordering directive, I think. In particular, the
> MGT is only used for client/server mounts, if the filesystem is up and
> running the MGT failover should be very transparent to the rest of the
> cluster and should never require another node to be restarted.
> Cliffw
> On 5/23/14, 7:58 AM, "Riccardo Murri" <riccardo.murri at uzh.ch> wrote:
>> Hello,
>> The online Lustre manual recommends that Lustre targets are started in
>> this order[1]: MGT, MDT, OSTs, clients.
>> [1]: 
>> http://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact/
>> lustre_manual.xhtml#dbdoclet.50438194_24122
>> Now we are setting up an HA cluster with Pacemaker, and a strict
>> ordering directive ("order ... Mandatory: ostXX mdt mgt") results in a
>> complete restart of all targets if the MGT is migrated. In my
>> experience (with Lustre 1.8.5) this is most of the time unnecessary,
>> and Lustre can recover from a single target restart.  However, we have
>> recently switched to Lustre 2.4.3 and things might have changed.
>> So the question is: is this order strict (in Lustre 2.4.3), or can a
>> target be stopped and restarted on another node without affecting the
>> targets running on other nodes?
>> Thanks for any help!
>> Riccardo
>> --
>> Riccardo Murri
>> http://www.gc3.uzh.ch/people/rm
>> Grid Computing Competence Centre
>> University of Zurich
>> Winterthurerstrasse 190, CH-8057 Zürich (Switzerland)
>> Tel: +41 44 635 4222
>> Fax: +41 44 635 6888
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> _______________________________________________
> HPDD-discuss mailing list
> HPDD-discuss at lists.01.org
> https://lists.01.org/mailman/listinfo/hpdd-discuss

More information about the lustre-discuss mailing list