[Lustre-discuss] Failover for MGS

Robert LeBlanc robert at leblancnet.us
Mon Nov 12 14:31:59 PST 2007


Since you are only adding parameters you don¹t need the ‹erase-params
option. I think.

Robert


On 11/12/07 3:23 PM, "Wojciech Turek" <wjt27 at cam.ac.uk> wrote:

> Hi,
> 
> Thanks for that. Actually I have a little more complex situation here. I have
> two sets of clients. First set is working in 10.142.10.0/24 network and the
> second set is working in 10.143.0.0/16 network.
> Each server has two NIC's. 
> NIC1 = ETH0 10.143.0.0/16 and NIC2= ETH1 10.142.10.0/24 
> lnet configures network in the following manner:
> eth0 = <ip>@tcp0
> eth1 = <ip>@tcp1
> 
> I am going to change lustre configuration in order to introduce failover
> features.
> MGS is cobined with with mdt01=/dev/dm-0
> 
> on mds01
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-0
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-1
> 
> on oss1
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-0
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-1
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-2
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-3
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-4
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-5
> 
> on oss2
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-6
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-7
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-8
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-9
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-10
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-11
> 
> on oss3
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-0
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-1
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-2
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-3
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-4
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-5
> 
> on oss4
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-6
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-7
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-8
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-9
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-10
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-11
>  
> Will above be correct?
> 
> Cheers,
> 
> Wojciech Turek
> 
> 
> On 12 Nov 2007, at 21:36, Robert LeBlanc wrote:
> 
>>  You should just unmount all the clients, all OSTs and then:
>>  
>>  tunefs.lustre ‹failnode 10.0.0.2 at tcp ‹writeconf /dev/shared/disk
>>  
>>  If your volume is already on the shared disk, them mount everything and you
>> should be good to go. You can also do it on a live mounted system by using
>> lctl, but I¹m not exactly sure how to do that.
>>  
>>  Robert
>>  
>>  On 11/12/07 2:24 PM, "Wojciech Turek" <wjt27 at cam.ac.uk> wrote:
>>  
>>  
>>> Hi,
>>>  
>>>  How will look my tunefs.lustre command line if I would like to configure
>>> failnode for my MDS. I have two MDT's and MGS is on the same block device
>>> that one of MDT's ? I have also two servers connected to share matadata
>>> storage.
>>>  
>>>  Thanks,
>>>  
>>>  Wojciech 
>>>  On 12 Nov 2007, at 20:49, Nathan Rutman wrote:
>>>  
>>>  
>>>> Robert LeBlanc wrote:
>>>>   
>>>>  
>>>>> Ok, I feel really stupid. I've done this before without any problem, but I
>>>>>  can't seem to get it to work and I can't find my notes from the last time
>>>>> I
>>>>>  did it. We have separate MGS and MDTs. I can't seem to get our MGS to
>>>>>  failover correctly after reformatting it.
>>>>>  
>>>>>  mkfs.lustre --mkfsoptions="-O dir_index" --reformat --mgs
>>>>>  --failnode=192.168.1.253 at o2ib /dev/mapper/ldiskc-part1
>>>>>  
>>>>>  
>>>>>   
>>>>>  
>>>> The MGS doesn't actually use the --failnode option (although it won't 
>>>>  hurt).  You actually have to tell the other nodes
>>>>  in the system (servers and clients) about the failover options for the 
>>>>  MGS (use the --mgsnode parameter on servers, and mount address for 
>>>>  clients).   The reason is because the servers must contact the MGS for 
>>>>  the configuration information, and they can't ask the MGS where its 
>>>>  failover partner is if e.g. the failover partner is the one that's
>>>> running.
>>>>  
>>>>   
>>>>  
>>>>> We are running this on Debian, using the Lustre 1.6.3 debs from svn on
>>>>> Lenny
>>>>>  with 2.6.22.12. I've tried several permutations of the mkfs.lustre
>>>>> command,
>>>>>  specifing both nodes as failover, and both nodes as MGS and pretty much
>>>>>  every other combination of the above. With the above command
>>>>> tunefs.lustre
>>>>>  shows that failnode and mgsnode are the failover node.
>>>>>  
>>>>>  Thanks,
>>>>>  Robert
>>>>>  
>>>>>  Robert LeBlanc
>>>>>  College of Life Sciences Computer Support
>>>>>  Brigham Young University
>>>>>  leblanc at byu.edu
>>>>>  (801)422-1882
>>>>>  
>>>>>  
>>>>>  _______________________________________________
>>>>>  Lustre-discuss mailing list
>>>>>  Lustre-discuss at clusterfs.com
>>>>>  https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>>>>  
>>>>>   
>>>>>  
>>>>>  _______________________________________________
>>>>>  Lustre-discuss mailing list
>>>>>  Lustre-discuss at clusterfs.com
>>>>>  https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>>>>   
>>>>>  
>>>>>   
>>>>>  Mr Wojciech Turek
>>>>>  Assistant System Manager
>>>>>  University of Cambridge
>>>>>  High Performance Computing service 
>>>>>  email: wjt27 at cam.ac.uk
>>>>>  tel. +441223763517
>>>>>  
>>>>>  
>>>>>   
>>>>>  
>>>>>  
>>>>>  
>>>>> 
>>>>>   
>>>>>  Robert LeBlanc
>>>>>  College of Life Sciences Computer Support
>>>>>  Brigham Young University
>>>>>  leblanc at byu.edu
>>>>>  (801)422-1882
>>>>>  
>>>>>     
>>>>> 
>>>>>  
>>>>> Mr Wojciech Turek
>>>>> Assistant System Manager
>>>>> University of Cambridge
>>>>> High Performance Computing service 
>>>>> email: wjt27 at cam.ac.uk
>>>>> tel. +441223763517
>>>>> 
>>>>> 
>>>>>  
>>>>> 
>>>>> 

 
Robert LeBlanc
College of Life Sciences Computer Support
Brigham Young University
leblanc at byu.edu
(801)422-1882


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20071112/58727c5c/attachment.htm>


More information about the lustre-discuss mailing list