[Lustre-discuss] MGS Nids

Fri May 21 04:43:04 PDT 2010

Wouldn't it be easier then to use brdb on the msg disk, so you dont have 
to move the lvm over to a new node?

On 05/21/2010 12:14 PM, Gabriele Paciucci wrote:
> Hi,
> be carefoul with LVM, you should import and export the volume when you
> try to mount from one machine to an other!!!!
>
> please refer to: http://kbase.redhat.com/faq/docs/DOC-4124
>
>
> On 05/21/2010 11:57 AM, leen smit wrote:
>    
>> Ok. I started from scratch, using your kind replies as a guide line.
>> Yet, still no fail over when brining down the first MGS.
>> Below are the steps I've taken to setup, hopefully some one here can
>> spot my err.
>> I got rid of keepalived and drbd (was this wise? or should I keep this
>> for the MGS/MDT syncing?) and setup just Lustre.
>>
>> Two nodes vor MGS/MDT, and two nodes for OSTs.
>>
>>
>> fs-mgs-001:~#   mkfs.lustre --mgs  --failnode=fs-mgs-002 at tcp --reformat
>> /dev/VG1/mgs
>> fs-mgs-001:~# mkfs.lustre  --mdt --mgsnode=fs-mgs-001 at tcp
>> --failnode=fs-mgs-002 at tcp --fsname=datafs --reformat /dev/VG1/mdt
>> fs-mgs-001:~#  mount -t lustre /dev/VG1/mgs /mnt/mgs/
>> fs-mgs-001:~#  mount -t lustre /dev/VG1/mdt /mnt/mdt/
>>
>>
>>      
>
>    
>> fs-mgs-002:~#   mkfs.lustre --mgs  --failnode=fs-mgs-001 at tcp --reformat
>> /dev/VG1/mgs
>> fs-mgs-002:~# mkfs.lustre  --mdt --mgsnode=fs-mgs-001 at tcp
>> --failnode=fs-mgs-001 at tcp --fsname=datafs --reformat /dev/VG1/mdt
>> fs-mgs-002:~#  mount -t lustre /dev/VG1/mgs /mnt/mgs/
>> fs-mgs-002:~#  mount -t lustre /dev/VG1/mdt /mnt/mdt/
>>
>>
>>      
> this is an error ^.. don't do it!!!
>
>    
>> fs-ost-001:~# mkfs.lustre --ost --mgsnode=fs-mgs-001 at tcp
>> --mgsnode=fs-mgs-002 at tcp --failnode=fs-ost-002 at tcp --reformat
>> --fsname=datafs /dev/VG1/ost1
>> fs-ost-001:~# mount -t lustre /dev/VG1/ost1 /mnt/ost/
>>
>>
>>      
>
>    
>> fs-ost-002:~# mkfs.lustre --ost --mgsnode=fs-mgs-001 at tcp
>> --mgsnode=fs-mgs-002 at tcp --failnode=fs-ost-001 at tcp --reformat
>> --fsname=datafs /dev/VG1/ost1
>> fs-ost-002:~# mount -t lustre /dev/VG1/ost1 /mnt/ost/
>>
>>
>>
>>      
> this is an error ^.. don't do it!!!
>
> the correct way is (WARNING: please use the IP address) :
>
> fs-mgs-001:~# mkfs.lustre --mgs  --failnode=fs-mgs-002 at tcp --reformat  /dev/VG1/mgs
> fs-mgs-001:~# mount -t lustre /dev/VG1/mgs /mnt/mgs/
>
> fs-mgs-001:~# mkfs.lustre  --mdt --mgsnode=fs-mgs-001 at tcp  --failnode=fs-mgs-002 at tcp --fsname=datafs --reformat /dev/VG1/mdt
> fs-mgs-001:~# mount -t lustre /dev/VG1/mdt /mnt/mdt/
>
> fs-ost-001:~# mkfs.lustre --ost --mgsnode=fs-mgs-001 at tcp  --failnode=fs-ost-002 at tcp --reformat --fsname=datafs /dev/VG1/ost1
> fs-ost-001:~# mount -t lustre /dev/VG1/ost1 /mnt/ost/
>
> yust this, nothing to do on the second node!!!
>
> mount -t lustre fs-mgs-001 at tcp:fs-mgs-002 at tcp:/datafs /data
> Bye
>
>
>
>    
>> fs-mgs-001:~# lctl dl
>>      0 UP mgs MGS MGS 7
>>      1 UP mgc MGC192.168.21.33 at tcp 5b8fb365-ae8e-9742-f374-539d8876276f 5
>>      2 UP mgc MGC127.0.1.1 at tcp 380bc932-eaf3-9955-7ff0-af96067a2487 5
>>      3 UP mdt MDS MDS_uuid 3
>>      4 UP lov datafs-mdtlov datafs-mdtlov_UUID 4
>>      5 UP mds datafs-MDT0000 datafs-MDT0000_UUID 5
>>      6 UP osc datafs-OST0000-osc datafs-mdtlov_UUID 5
>>      7 UP osc datafs-OST0001-osc datafs-mdtlov_UUID 5
>>
>> fs-mgs-001:~# lctl list_nids
>> 192.168.21.32 at tcp
>>
>>
>> client:~# mount -t lustre 192.168.21.32 at tcp:192.168.21.33 at tcp:/datafs /data
>> client:~# time cp test.file /data/
>> real    0m47.793s
>> user    0m0.001s
>> sys     0m3.155s
>>
>> So far, so good.
>>
>>
>> Lets try that again, now bringing down mgs-001
>>
>> client:~# time cp test.file /data/
>>
>> fs-mgs-001:~#  umount /mnt/mdt&&   umount /mnt/mgs
>>
>> fs-mgs-002:~# mount -t lustre /dev/VG1/mgs /mnt/mgs
>> fs-mgs-002:~# mount -t lustre /dev/VG1/mdt /mnt/mdt
>> fs-mgs-002:~# lctl dl
>>      0 UP mgs MGS MGS 5
>>      1 UP mgc MGC192.168.21.32 at tcp 82b34916-ed89-f5b9-026e-7f8e1370765f 5
>>      2 UP mdt MDS MDS_uuid 3
>>      3 UP lov datafs-mdtlov datafs-mdtlov_UUID 4
>>      4 UP mds datafs-MDT0000 datafs-MDT0000_UUID 3
>>
>> Missing the OSTs here, so I (try to..) remount these too
>>
>> fs-ost-001:~# umount /mnt/ost/
>> fs-ost-001:~# mount -t lustre /dev/VG1/ost1 /mnt/ost/
>> mount.lustre: mount /dev/mapper/VG1-ost1 at /mnt/ost failed: No such
>> device or address
>> The target service failed to start (bad config log?)
>> (/dev/mapper/VG1-ost1).  See /var/log/messages.
>>
>>
>> After this I can only get back to a running state by umounting
>> everything on the mgs-002, and remount on the mgs-001
>> What am I missing here?? Am I messing things up by creating two mgs, one
>> on each mgs node?
>>
>>
>> Leen
>>
>>
>>
>> On 05/20/2010 03:40 PM, Gabriele Paciucci wrote:
>>
>>      
>>> For a clearification in a two servers configuration:
>>>
>>> server1 ->    192.168.2.20 MGS+MDT+OST0
>>> server2 ->    192.168.2.22 OST1
>>> /dev/sdb is a lun shared between server1 and server 2
>>>
>>> from server1: mkfs.lustre --mgs --failnode=192.168.2.22 --reformat /dev/sdb1
>>> from server1: mkfs.lustre  --reformat --mdt --mgsnode=192.168.2.20
>>> --fsname=prova --failover=192.168.2.22 /dev/sdb4
>>> from server1: mkfs.lustre  --reformat --ost --mgsnode=192.168.2.20
>>> --failover=192.168.2.22 --fsname=prova /dev/sdb2
>>> from server2: mkfs.lustre  --reformat --ost --mgsnode=192.168.2.20
>>> --failover=192.168.2.20 --fsname=prova /dev/sdb3
>>>
>>>
>>> from server1: mount -t lustre /dev/sdb1 /lustre/mgs_prova
>>> from server1: mount -t lustre /dev/sdb4 /lustre/mdt_prova
>>> from server1: mount -t lustre /dev/sdb2 /lustre/ost0_prova
>>> from server2: mount -t lustre /dev/sdb3 /lustre/ost1_prova
>>>
>>>
>>> from client:
>>> modprobe lustre
>>> mount -t lustre 192.168.2.20 at tcp:192.168.2.22 at tcp:/prova /prova
>>>
>>> now halt server1 and mount MGS, MDT and OST0 on server2, the client
>>> should continue the activity without problem
>>>
>>>
>>>
>>> On 05/20/2010 02:55 PM, Kevin Van Maren wrote:
>>>
>>>
>>>        
>>>> leen smit wrote:
>>>>
>>>>
>>>>
>>>>          
>>>>> Ok, no VIP's then.. But how does failover work in lustre then?
>>>>> If I setup everything using the real IP and then mount from a client and
>>>>> bring down the active MGS, the client will just sit there until it comes
>>>>> back up again.
>>>>> As in, there is no failover to the second node.  So how does this
>>>>> internal lustre failover mechanism work?
>>>>>
>>>>> I've been going trought the docs, and I must say there is very little on
>>>>> the failover mechanism, apart from mentions that a seperate app should
>>>>> care of that. Thats the reason I'm implementing keepalived..
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>            
>>>> Right: the external service needs to keep the "mount" active/healthy on
>>>> one of the servers.
>>>> Lustre handles reconnecting clients/servers as long as the volume is
>>>> mounted where it expects
>>>> (ie, the mkfs node or the --failover node).
>>>>
>>>>
>>>>
>>>>          
>>>>> At this stage I really am clueless, and can only think of creating a TUN
>>>>> interface, which will have the VIP address (thus, it becomes a real IP,
>>>>> not just a VIP).
>>>>> But I got a feeling that ain't the right approach either...
>>>>> Is there any docs available where a active/passive MGS setup is described?
>>>>> Is it sufficient to define a --failnode=nid,...  at creation time?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>            
>>>> Yep.  See Johann's email on the MGS, but for the MDTs and OSTs that's
>>>> all you have to do
>>>> (besides listing both MGS NIDs at mkfs time).
>>>>
>>>> For the clients, you specify both MGS NIDs at mount time, so it can
>>>> mount regardless of which
>>>> node has the active MGS.
>>>>
>>>> Kevin
>>>>
>>>>
>>>>
>>>>
>>>>          
>>>>> Any help would be greatly appreciated!
>>>>>
>>>>> Leen
>>>>>
>>>>>
>>>>> On 05/20/2010 01:45 PM, Brian J. Murrell wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>            
>>>>>> On Thu, 2010-05-20 at 12:46 +0200, leen smit wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> Keepalive uses a VIP in a active/passive state. In a failover situation
>>>>>>> the VIP gets transferred to the passive one.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> Don't use virtual IPs with Lustre.  Lustre clients know how to deal with
>>>>>> failover nodes that have different IP addresses and using a virtual,
>>>>>> floating IP address will just confuse it.
>>>>>>
>>>>>> b.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>> _______________________________________________
>>>>> Lustre-discuss mailing list
>>>>> Lustre-discuss at lists.lustre.org
>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>            
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>
>>>>
>>>>
>>>>
>>>>          
>>>
>>>        
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>>      
>
>