[Lustre-discuss] Failover for MGS
Wojciech Turek
wjt27 at cam.ac.uk
Mon Nov 12 14:23:03 PST 2007
Hi,
Thanks for that. Actually I have a little more complex situation
here. I have two sets of clients. First set is working in
10.142.10.0/24 network and the second set is working in 10.143.0.0/16
network.
Each server has two NIC's.
NIC1 = ETH0 10.143.0.0/16 and NIC2= ETH1 10.142.10.0/24
lnet configures network in the following manner:
eth0 = <ip>@tcp0
eth1 = <ip>@tcp1
I am going to change lustre configuration in order to introduce
failover features.
MGS is cobined with with mdt01=/dev/dm-0
on mds01
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-0
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-1
on oss1
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-0
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-1
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-2
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-3
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-4
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-5
on oss2
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-6
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-7
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-8
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-9
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-10
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-11
on oss3
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-0
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-1
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-2
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-3
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-4
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-5
on oss4
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-6
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-7
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-8
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-9
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-10
tunefs.lustre --erase-params --writeconf --
failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1 --
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 --
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-11
Will above be correct?
Cheers,
Wojciech Turek
On 12 Nov 2007, at 21:36, Robert LeBlanc wrote:
> You should just unmount all the clients, all OSTs and then:
>
> tunefs.lustre —failnode 10.0.0.2 at tcp —writeconf /dev/shared/disk
>
> If your volume is already on the shared disk, them mount everything
> and you should be good to go. You can also do it on a live mounted
> system by using lctl, but I’m not exactly sure how to do that.
>
> Robert
>
> On 11/12/07 2:24 PM, "Wojciech Turek" <wjt27 at cam.ac.uk> wrote:
>
>> Hi,
>>
>> How will look my tunefs.lustre command line if I would like to
>> configure failnode for my MDS. I have two MDT's and MGS is on the
>> same block device that one of MDT's ? I have also two servers
>> connected to share matadata storage.
>>
>> Thanks,
>>
>> Wojciech
>> On 12 Nov 2007, at 20:49, Nathan Rutman wrote:
>>
>>> Robert LeBlanc wrote:
>>>
>>>> Ok, I feel really stupid. I've done this before without any
>>>> problem, but I
>>>> can't seem to get it to work and I can't find my notes from the
>>>> last time I
>>>> did it. We have separate MGS and MDTs. I can't seem to get our
>>>> MGS to
>>>> failover correctly after reformatting it.
>>>>
>>>> mkfs.lustre --mkfsoptions="-O dir_index" --reformat --mgs
>>>> --failnode=192.168.1.253 at o2ib /dev/mapper/ldiskc-part1
>>>>
>>>>
>>>>
>>> The MGS doesn't actually use the --failnode option (although it
>>> won't
>>> hurt). You actually have to tell the other nodes
>>> in the system (servers and clients) about the failover options
>>> for the
>>> MGS (use the --mgsnode parameter on servers, and mount address for
>>> clients). The reason is because the servers must contact the
>>> MGS for
>>> the configuration information, and they can't ask the MGS where its
>>> failover partner is if e.g. the failover partner is the one
>>> that's running.
>>>
>>>
>>>> We are running this on Debian, using the Lustre 1.6.3 debs from
>>>> svn on Lenny
>>>> with 2.6.22.12. I've tried several permutations of the
>>>> mkfs.lustre command,
>>>> specifing both nodes as failover, and both nodes as MGS and
>>>> pretty much
>>>> every other combination of the above. With the above command
>>>> tunefs.lustre
>>>> shows that failnode and mgsnode are the failover node.
>>>>
>>>> Thanks,
>>>> Robert
>>>>
>>>> Robert LeBlanc
>>>> College of Life Sciences Computer Support
>>>> Brigham Young University
>>>> leblanc at byu.edu
>>>> (801)422-1882
>>>>
>>>>
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at clusterfs.com
>>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at clusterfs.com
>>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>>>
>>>>
>>>>
>>>> Mr Wojciech Turek
>>>> Assistant System Manager
>>>> University of Cambridge
>>>> High Performance Computing service
>>>> email: wjt27 at cam.ac.uk
>>>> tel. +441223763517
>>>>
>>>>
>>>>
>>>>
>>>>
>
>
> Robert LeBlanc
> College of Life Sciences Computer Support
> Brigham Young University
> leblanc at byu.edu
> (801)422-1882
>
Mr Wojciech Turek
Assistant System Manager
University of Cambridge
High Performance Computing service
email: wjt27 at cam.ac.uk
tel. +441223763517
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20071112/f07e40ee/attachment.htm>
More information about the lustre-discuss
mailing list