[Lustre-discuss] Active-Active failover configuration

Wed Apr 21 03:36:57 PDT 2010

Thank you for your answer!

Unfortunately, I have read this manual and meet some problems.

I've configured heartbeat, have defined resources controlled by Heartbeat and haven't found any error in HA logs.
I've got 3 shared resources controlled by heartbeat, 2 OSTs and 1 MDS (described in previous message).

When I use "hb_takeover all" utilite on OSS1 (s1 in previous message) to takeover control over OSS2 resources - OST1 and MDS (OST1 and MDT mounted on OSS2 (s2) in standard configuration)) it takes control and I saw all resources mounted on one OSS1; But I cannot use lustre filesystem, can't mount it on a client.

On active OSS1 I can see in dmesg that all OSTs try to connect to mds using old (standard) address of OST2; but MDS moved to OSS1.

What have I missing?
May be I have to specify some keys when formatting MDSs/OSTs to let them work correcly in case of switching resources to another OSS node? How to do it?

__________
Thanks,
Katya

>Section 8.3.2.2 (Configuring Heartbeat) includes a worked example to configure OST failover (active/active):

>http://wiki.lustre.org/manual/LustreManual18_HTML/Failover.html#50598002_pgfId-1295199

>On 4/20/2010 6:42 AM, xgl at xgl.pereslavl.ru wrote:
>> Greetings!
>>
>> Sorry for troubling you by such question but I cannot find example in documentation and meet some problems.
>>
>> I want to use Active/Active failover configuration. (Where can I find some examples?)
>>
>> I have 2 nodes - s1 and s2 used as OSS'es
>> I also have 3 block devices
>> (MDS) - 1Tb used as MDS|MDT
>> (OST0) 8Tb used as OST1
>> (OST1) 8Tb used as OST2
>> All devices available from both OSS's.
>>
>> OST0 mounted on s1
>> MDS and OST1 mounted on s2 in normal state.
>>
>> How can I configure system such way, that if one os OSS'es (s2, as an example), fails out, second OSS (s1) take control of all resources?
>>
>> I have heartbeat installed and configured.
>> [root at s2 ~]# cat /etc/ha.d/haresources
>> s1 Filesystem::/dev/disk/b801::/mnt/ost0::lustre
>> s2 Filesystem::/dev/disk/b800::/mnt/mdt::lustre Filesystem::/dev/disk/8800::/mnt/ost1::lustre
>>
>> I configure system;
>> On s2 I format and mount MDT and OST1
>> mkfs.lustre --reformat --fsname=lustre --mgs --mdt /dev/disk/by-id/b800   
>> mount -t lustre /dev/disk/b800 /mnt/mdt/
>> mkfs.lustre --reformat --ost --fsname=lustre --mgsnode=192.168.11.12 at o2ib /dev/disk/8800
>> mount -t lustre /dev/disk/8800 /mnt/ost1
>>
>> On s1 I format and mount OST0
>> mkfs.lustre --reformat --ost --fsname=lustre --mgsnode=192.168.11.12 at o2ib /dev/disk/b801
>> mount -t lustre /dev/disk/b801 /mnt/ost0
>>
>> service heartbeat up and running on both nodes.
>>
>> Where have I add some parameters to have lustre up and running if s2 going down? Or where can I find some >examples?
>> How can s1 takeover MDS (/mnt/mdt) and OST1 (/mnt/ost1) that usually mounted on s2?
>>
>> Thanks,
>> Katya
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>