[Lustre-discuss] Failover for MGS
Robert LeBlanc
robert at leblancnet.us
Mon Nov 12 14:31:59 PST 2007
Since you are only adding parameters you don¹t need the erase-params
option. I think.
Robert
On 11/12/07 3:23 PM, "Wojciech Turek" <wjt27 at cam.ac.uk> wrote:
> Hi,
>
> Thanks for that. Actually I have a little more complex situation here. I have
> two sets of clients. First set is working in 10.142.10.0/24 network and the
> second set is working in 10.143.0.0/16 network.
> Each server has two NIC's.
> NIC1 = ETH0 10.143.0.0/16 and NIC2= ETH1 10.142.10.0/24
> lnet configures network in the following manner:
> eth0 = <ip>@tcp0
> eth1 = <ip>@tcp1
>
> I am going to change lustre configuration in order to introduce failover
> features.
> MGS is cobined with with mdt01=/dev/dm-0
>
> on mds01
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-0
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-1
>
> on oss1
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-0
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-1
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-2
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-3
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-4
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-5
>
> on oss2
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-6
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-7
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-8
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-9
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-10
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-11
>
> on oss3
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-0
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-1
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-2
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-3
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-4
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-5
>
> on oss4
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-6
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-7
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-8
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-9
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-10
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-11
>
> Will above be correct?
>
> Cheers,
>
> Wojciech Turek
>
>
> On 12 Nov 2007, at 21:36, Robert LeBlanc wrote:
>
>> You should just unmount all the clients, all OSTs and then:
>>
>> tunefs.lustre failnode 10.0.0.2 at tcp writeconf /dev/shared/disk
>>
>> If your volume is already on the shared disk, them mount everything and you
>> should be good to go. You can also do it on a live mounted system by using
>> lctl, but I¹m not exactly sure how to do that.
>>
>> Robert
>>
>> On 11/12/07 2:24 PM, "Wojciech Turek" <wjt27 at cam.ac.uk> wrote:
>>
>>
>>> Hi,
>>>
>>> How will look my tunefs.lustre command line if I would like to configure
>>> failnode for my MDS. I have two MDT's and MGS is on the same block device
>>> that one of MDT's ? I have also two servers connected to share matadata
>>> storage.
>>>
>>> Thanks,
>>>
>>> Wojciech
>>> On 12 Nov 2007, at 20:49, Nathan Rutman wrote:
>>>
>>>
>>>> Robert LeBlanc wrote:
>>>>
>>>>
>>>>> Ok, I feel really stupid. I've done this before without any problem, but I
>>>>> can't seem to get it to work and I can't find my notes from the last time
>>>>> I
>>>>> did it. We have separate MGS and MDTs. I can't seem to get our MGS to
>>>>> failover correctly after reformatting it.
>>>>>
>>>>> mkfs.lustre --mkfsoptions="-O dir_index" --reformat --mgs
>>>>> --failnode=192.168.1.253 at o2ib /dev/mapper/ldiskc-part1
>>>>>
>>>>>
>>>>>
>>>>>
>>>> The MGS doesn't actually use the --failnode option (although it won't
>>>> hurt). You actually have to tell the other nodes
>>>> in the system (servers and clients) about the failover options for the
>>>> MGS (use the --mgsnode parameter on servers, and mount address for
>>>> clients). The reason is because the servers must contact the MGS for
>>>> the configuration information, and they can't ask the MGS where its
>>>> failover partner is if e.g. the failover partner is the one that's
>>>> running.
>>>>
>>>>
>>>>
>>>>> We are running this on Debian, using the Lustre 1.6.3 debs from svn on
>>>>> Lenny
>>>>> with 2.6.22.12. I've tried several permutations of the mkfs.lustre
>>>>> command,
>>>>> specifing both nodes as failover, and both nodes as MGS and pretty much
>>>>> every other combination of the above. With the above command
>>>>> tunefs.lustre
>>>>> shows that failnode and mgsnode are the failover node.
>>>>>
>>>>> Thanks,
>>>>> Robert
>>>>>
>>>>> Robert LeBlanc
>>>>> College of Life Sciences Computer Support
>>>>> Brigham Young University
>>>>> leblanc at byu.edu
>>>>> (801)422-1882
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Lustre-discuss mailing list
>>>>> Lustre-discuss at clusterfs.com
>>>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Lustre-discuss mailing list
>>>>> Lustre-discuss at clusterfs.com
>>>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>>>>
>>>>>
>>>>>
>>>>> Mr Wojciech Turek
>>>>> Assistant System Manager
>>>>> University of Cambridge
>>>>> High Performance Computing service
>>>>> email: wjt27 at cam.ac.uk
>>>>> tel. +441223763517
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Robert LeBlanc
>>>>> College of Life Sciences Computer Support
>>>>> Brigham Young University
>>>>> leblanc at byu.edu
>>>>> (801)422-1882
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Mr Wojciech Turek
>>>>> Assistant System Manager
>>>>> University of Cambridge
>>>>> High Performance Computing service
>>>>> email: wjt27 at cam.ac.uk
>>>>> tel. +441223763517
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
Robert LeBlanc
College of Life Sciences Computer Support
Brigham Young University
leblanc at byu.edu
(801)422-1882
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20071112/58727c5c/attachment.htm>
More information about the lustre-discuss
mailing list