[Lustre-discuss] Question on setting up fail-over

David Noriega tsk133 at my.utsa.edu
Tue Aug 17 10:19:57 PDT 2010


Oppps some how I changed the target name of all OSTs to lustre-OST0000
and trying to mount any other ost fails. I've gone and found the 'More
Complicated Configuration' section which details the usage of
--mgsnode=nid1,nid2 and so using this I think I'll just reformat.

On Tue, Aug 17, 2010 at 11:26 AM, David Noriega <tsk133 at my.utsa.edu> wrote:
> Some info:
> MDS/MGS 192.168.5.104
> Passive failover MDS/MGS 192.168.5.105
> OSS1 192.168.5.100
> OSS2 192.168.5.101
>
> I've got some more questions about setting up failover. Besides having
> heartbeat setup, what about using tunefs.lustre to set options?
>
> On the MDS/MGS I set the following options
> tunefs.lustre --failnode=192.168.5.105 /dev/lustre-mdt-dg/lv1
> Heartbeat works just fine, can mount on the primary node and then
> failover to the other and back.
>
> Now on the OSSs things get a bit more confusing. Reading these two blog posts:
> http://mergingbusinessandit.blogspot.com/2008/12/implementing-lustre-failover.html
> http://jermen.posterous.com/lustre-mds-failover
>
> From these I tried these options:
> tunefs.lustre --erase-params --mgsnode=192.168.5.104 at tcp0
> --mgsnode=192.168.5.105 at tcp0 --failover=192.168.5.101 at tcp0
> -write-params /dev/lustre-ost1-dg1/lv1
>
> I ran that for all for OSTs, changing the failover option on the last
> two OSTs to point OSS1 while the first two point to OST2.
>
> My understanding is that you mount the OSTs first, then the MDS, but
> the OSTs are failing to mount. Are all these options needed? Or is
> simply specifying the primary MDS is enough for it to find out about
> the second MDS?
>
> David
>
> On Mon, Aug 16, 2010 at 2:14 PM, Kevin Van Maren
> <kevin.van.maren at oracle.com> wrote:
>> David Noriega wrote:
>>>
>>> Ok I've gotten heartbeat setup with the two OSSs, but I do have a
>>> question that isn't stated in the documentation. Shouldn't the lustre
>>> mounts be removed from fstab once they are given to heartbeat since
>>> when it comes online, it will mount the resources, correct?
>>>
>>> David
>>>
>>
>>
>> Yes: on the servers, they must be not there or "noauto".  Once you start
>> running heartbeat,
>> you have given control of the resource away, and must not mount/umount it
>> yourself
>> (unless you stop heartbeat on both nodes in the HA pair to get control
>> back).
>>
>> Kevin
>>
>>
>
>
>
> --
> Personally, I liked the university. They gave us money and facilities,
> we didn't have to produce anything! You've never been out of college!
> You don't know what it's like out there! I've worked in the private
> sector. They expect results. -Ray Ghostbusters
>



-- 
Personally, I liked the university. They gave us money and facilities,
we didn't have to produce anything! You've never been out of college!
You don't know what it's like out there! I've worked in the private
sector. They expect results. -Ray Ghostbusters



More information about the lustre-discuss mailing list