[Lustre-discuss] MGS Nids

Fri May 21 03:14:23 PDT 2010

Hi,
be carefoul with LVM, you should import and export the volume when you 
try to mount from one machine to an other!!!!

please refer to: http://kbase.redhat.com/faq/docs/DOC-4124

On 05/21/2010 11:57 AM, leen smit wrote:
> Ok. I started from scratch, using your kind replies as a guide line.
> Yet, still no fail over when brining down the first MGS.
> Below are the steps I've taken to setup, hopefully some one here can
> spot my err.
> I got rid of keepalived and drbd (was this wise? or should I keep this
> for the MGS/MDT syncing?) and setup just Lustre.
>
> Two nodes vor MGS/MDT, and two nodes for OSTs.
>
>
> fs-mgs-001:~#   mkfs.lustre --mgs  --failnode=fs-mgs-002 at tcp --reformat
> /dev/VG1/mgs
> fs-mgs-001:~# mkfs.lustre  --mdt --mgsnode=fs-mgs-001 at tcp
> --failnode=fs-mgs-002 at tcp --fsname=datafs --reformat /dev/VG1/mdt
> fs-mgs-001:~#  mount -t lustre /dev/VG1/mgs /mnt/mgs/
> fs-mgs-001:~#  mount -t lustre /dev/VG1/mdt /mnt/mdt/
>
>    

> fs-mgs-002:~#   mkfs.lustre --mgs  --failnode=fs-mgs-001 at tcp --reformat
> /dev/VG1/mgs
> fs-mgs-002:~# mkfs.lustre  --mdt --mgsnode=fs-mgs-001 at tcp
> --failnode=fs-mgs-001 at tcp --fsname=datafs --reformat /dev/VG1/mdt
> fs-mgs-002:~#  mount -t lustre /dev/VG1/mgs /mnt/mgs/
> fs-mgs-002:~#  mount -t lustre /dev/VG1/mdt /mnt/mdt/
>
>    

this is an error ^.. don't do it!!!

> fs-ost-001:~# mkfs.lustre --ost --mgsnode=fs-mgs-001 at tcp
> --mgsnode=fs-mgs-002 at tcp --failnode=fs-ost-002 at tcp --reformat
> --fsname=datafs /dev/VG1/ost1
> fs-ost-001:~# mount -t lustre /dev/VG1/ost1 /mnt/ost/
>
>    

> fs-ost-002:~# mkfs.lustre --ost --mgsnode=fs-mgs-001 at tcp
> --mgsnode=fs-mgs-002 at tcp --failnode=fs-ost-001 at tcp --reformat
> --fsname=datafs /dev/VG1/ost1
> fs-ost-002:~# mount -t lustre /dev/VG1/ost1 /mnt/ost/
>
>
>    
this is an error ^.. don't do it!!!

the correct way is (WARNING: please use the IP address) :

fs-mgs-001:~# mkfs.lustre --mgs  --failnode=fs-mgs-002 at tcp --reformat  /dev/VG1/mgs
fs-mgs-001:~# mount -t lustre /dev/VG1/mgs /mnt/mgs/

fs-mgs-001:~# mkfs.lustre  --mdt --mgsnode=fs-mgs-001 at tcp  --failnode=fs-mgs-002 at tcp --fsname=datafs --reformat /dev/VG1/mdt
fs-mgs-001:~# mount -t lustre /dev/VG1/mdt /mnt/mdt/

fs-ost-001:~# mkfs.lustre --ost --mgsnode=fs-mgs-001 at tcp  --failnode=fs-ost-002 at tcp --reformat --fsname=datafs /dev/VG1/ost1
fs-ost-001:~# mount -t lustre /dev/VG1/ost1 /mnt/ost/

yust this, nothing to do on the second node!!!

mount -t lustre fs-mgs-001 at tcp:fs-mgs-002 at tcp:/datafs /data
Bye

> fs-mgs-001:~# lctl dl
>     0 UP mgs MGS MGS 7
>     1 UP mgc MGC192.168.21.33 at tcp 5b8fb365-ae8e-9742-f374-539d8876276f 5
>     2 UP mgc MGC127.0.1.1 at tcp 380bc932-eaf3-9955-7ff0-af96067a2487 5
>     3 UP mdt MDS MDS_uuid 3
>     4 UP lov datafs-mdtlov datafs-mdtlov_UUID 4
>     5 UP mds datafs-MDT0000 datafs-MDT0000_UUID 5
>     6 UP osc datafs-OST0000-osc datafs-mdtlov_UUID 5
>     7 UP osc datafs-OST0001-osc datafs-mdtlov_UUID 5
>
> fs-mgs-001:~# lctl list_nids
> 192.168.21.32 at tcp
>
>
> client:~# mount -t lustre 192.168.21.32 at tcp:192.168.21.33 at tcp:/datafs /data
> client:~# time cp test.file /data/
> real    0m47.793s
> user    0m0.001s
> sys     0m3.155s
>
> So far, so good.
>
>
> Lets try that again, now bringing down mgs-001
>
> client:~# time cp test.file /data/
>
> fs-mgs-001:~#  umount /mnt/mdt&&  umount /mnt/mgs
>
> fs-mgs-002:~# mount -t lustre /dev/VG1/mgs /mnt/mgs
> fs-mgs-002:~# mount -t lustre /dev/VG1/mdt /mnt/mdt
> fs-mgs-002:~# lctl dl
>     0 UP mgs MGS MGS 5
>     1 UP mgc MGC192.168.21.32 at tcp 82b34916-ed89-f5b9-026e-7f8e1370765f 5
>     2 UP mdt MDS MDS_uuid 3
>     3 UP lov datafs-mdtlov datafs-mdtlov_UUID 4
>     4 UP mds datafs-MDT0000 datafs-MDT0000_UUID 3
>
> Missing the OSTs here, so I (try to..) remount these too
>
> fs-ost-001:~# umount /mnt/ost/
> fs-ost-001:~# mount -t lustre /dev/VG1/ost1 /mnt/ost/
> mount.lustre: mount /dev/mapper/VG1-ost1 at /mnt/ost failed: No such
> device or address
> The target service failed to start (bad config log?)
> (/dev/mapper/VG1-ost1).  See /var/log/messages.
>
>
> After this I can only get back to a running state by umounting
> everything on the mgs-002, and remount on the mgs-001
> What am I missing here?? Am I messing things up by creating two mgs, one
> on each mgs node?
>
>
> Leen
>
>
>
> On 05/20/2010 03:40 PM, Gabriele Paciucci wrote:
>    
>> For a clearification in a two servers configuration:
>>
>> server1 ->   192.168.2.20 MGS+MDT+OST0
>> server2 ->   192.168.2.22 OST1
>> /dev/sdb is a lun shared between server1 and server 2
>>
>> from server1: mkfs.lustre --mgs --failnode=192.168.2.22 --reformat /dev/sdb1
>> from server1: mkfs.lustre  --reformat --mdt --mgsnode=192.168.2.20
>> --fsname=prova --failover=192.168.2.22 /dev/sdb4
>> from server1: mkfs.lustre  --reformat --ost --mgsnode=192.168.2.20
>> --failover=192.168.2.22 --fsname=prova /dev/sdb2
>> from server2: mkfs.lustre  --reformat --ost --mgsnode=192.168.2.20
>> --failover=192.168.2.20 --fsname=prova /dev/sdb3
>>
>>
>> from server1: mount -t lustre /dev/sdb1 /lustre/mgs_prova
>> from server1: mount -t lustre /dev/sdb4 /lustre/mdt_prova
>> from server1: mount -t lustre /dev/sdb2 /lustre/ost0_prova
>> from server2: mount -t lustre /dev/sdb3 /lustre/ost1_prova
>>
>>
>> from client:
>> modprobe lustre
>> mount -t lustre 192.168.2.20 at tcp:192.168.2.22 at tcp:/prova /prova
>>
>> now halt server1 and mount MGS, MDT and OST0 on server2, the client
>> should continue the activity without problem
>>
>>
>>
>> On 05/20/2010 02:55 PM, Kevin Van Maren wrote:
>>
>>      
>>> leen smit wrote:
>>>
>>>
>>>        
>>>> Ok, no VIP's then.. But how does failover work in lustre then?
>>>> If I setup everything using the real IP and then mount from a client and
>>>> bring down the active MGS, the client will just sit there until it comes
>>>> back up again.
>>>> As in, there is no failover to the second node.  So how does this
>>>> internal lustre failover mechanism work?
>>>>
>>>> I've been going trought the docs, and I must say there is very little on
>>>> the failover mechanism, apart from mentions that a seperate app should
>>>> care of that. Thats the reason I'm implementing keepalived..
>>>>
>>>>
>>>>
>>>>          
>>> Right: the external service needs to keep the "mount" active/healthy on
>>> one of the servers.
>>> Lustre handles reconnecting clients/servers as long as the volume is
>>> mounted where it expects
>>> (ie, the mkfs node or the --failover node).
>>>
>>>
>>>        
>>>> At this stage I really am clueless, and can only think of creating a TUN
>>>> interface, which will have the VIP address (thus, it becomes a real IP,
>>>> not just a VIP).
>>>> But I got a feeling that ain't the right approach either...
>>>> Is there any docs available where a active/passive MGS setup is described?
>>>> Is it sufficient to define a --failnode=nid,...  at creation time?
>>>>
>>>>
>>>>
>>>>          
>>> Yep.  See Johann's email on the MGS, but for the MDTs and OSTs that's
>>> all you have to do
>>> (besides listing both MGS NIDs at mkfs time).
>>>
>>> For the clients, you specify both MGS NIDs at mount time, so it can
>>> mount regardless of which
>>> node has the active MGS.
>>>
>>> Kevin
>>>
>>>
>>>
>>>        
>>>> Any help would be greatly appreciated!
>>>>
>>>> Leen
>>>>
>>>>
>>>> On 05/20/2010 01:45 PM, Brian J. Murrell wrote:
>>>>
>>>>
>>>>
>>>>          
>>>>> On Thu, 2010-05-20 at 12:46 +0200, leen smit wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>            
>>>>>> Keepalive uses a VIP in a active/passive state. In a failover situation
>>>>>> the VIP gets transferred to the passive one.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>> Don't use virtual IPs with Lustre.  Lustre clients know how to deal with
>>>>> failover nodes that have different IP addresses and using a virtual,
>>>>> floating IP address will just confuse it.
>>>>>
>>>>> b.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>            
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>
>>>>
>>>>
>>>>          
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>
>>>
>>>
>>>        
>>
>>      
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>    

-- 
_Gabriele Paciucci_ http://www.linkedin.com/in/paciucci

Pursuant to legislative Decree n. 196/03 you are hereby informed that this email contains confidential information intended only for use of addressee. If you are not the addressee and have received this email by mistake, please send this email to the sender. You may not copy or disseminate this message to anyone. Thank You.