[Lustre-discuss] MGS Nids

Fri May 21 02:57:22 PDT 2010

Ok. I started from scratch, using your kind replies as a guide line. 
Yet, still no fail over when brining down the first MGS.
Below are the steps I've taken to setup, hopefully some one here can 
spot my err.
I got rid of keepalived and drbd (was this wise? or should I keep this 
for the MGS/MDT syncing?) and setup just Lustre.

Two nodes vor MGS/MDT, and two nodes for OSTs.

fs-mgs-001:~#   mkfs.lustre --mgs  --failnode=fs-mgs-002 at tcp --reformat 
/dev/VG1/mgs
fs-mgs-001:~# mkfs.lustre  --mdt --mgsnode=fs-mgs-001 at tcp 
--failnode=fs-mgs-002 at tcp --fsname=datafs --reformat /dev/VG1/mdt
fs-mgs-001:~#  mount -t lustre /dev/VG1/mgs /mnt/mgs/
fs-mgs-001:~#  mount -t lustre /dev/VG1/mdt /mnt/mdt/

fs-mgs-002:~#   mkfs.lustre --mgs  --failnode=fs-mgs-001 at tcp --reformat 
/dev/VG1/mgs
fs-mgs-002:~# mkfs.lustre  --mdt --mgsnode=fs-mgs-001 at tcp 
--failnode=fs-mgs-001 at tcp --fsname=datafs --reformat /dev/VG1/mdt
fs-mgs-002:~#  mount -t lustre /dev/VG1/mgs /mnt/mgs/
fs-mgs-002:~#  mount -t lustre /dev/VG1/mdt /mnt/mdt/

fs-ost-001:~# mkfs.lustre --ost --mgsnode=fs-mgs-001 at tcp 
--mgsnode=fs-mgs-002 at tcp --failnode=fs-ost-002 at tcp --reformat 
--fsname=datafs /dev/VG1/ost1
fs-ost-001:~# mount -t lustre /dev/VG1/ost1 /mnt/ost/

fs-ost-002:~# mkfs.lustre --ost --mgsnode=fs-mgs-001 at tcp 
--mgsnode=fs-mgs-002 at tcp --failnode=fs-ost-001 at tcp --reformat 
--fsname=datafs /dev/VG1/ost1
fs-ost-002:~# mount -t lustre /dev/VG1/ost1 /mnt/ost/

fs-mgs-001:~# lctl dl
   0 UP mgs MGS MGS 7
   1 UP mgc MGC192.168.21.33 at tcp 5b8fb365-ae8e-9742-f374-539d8876276f 5
   2 UP mgc MGC127.0.1.1 at tcp 380bc932-eaf3-9955-7ff0-af96067a2487 5
   3 UP mdt MDS MDS_uuid 3
   4 UP lov datafs-mdtlov datafs-mdtlov_UUID 4
   5 UP mds datafs-MDT0000 datafs-MDT0000_UUID 5
   6 UP osc datafs-OST0000-osc datafs-mdtlov_UUID 5
   7 UP osc datafs-OST0001-osc datafs-mdtlov_UUID 5

fs-mgs-001:~# lctl list_nids
192.168.21.32 at tcp

client:~# mount -t lustre 192.168.21.32 at tcp:192.168.21.33 at tcp:/datafs /data
client:~# time cp test.file /data/
real    0m47.793s
user    0m0.001s
sys     0m3.155s

So far, so good.

Lets try that again, now bringing down mgs-001

client:~# time cp test.file /data/

fs-mgs-001:~#  umount /mnt/mdt && umount /mnt/mgs

fs-mgs-002:~# mount -t lustre /dev/VG1/mgs /mnt/mgs
fs-mgs-002:~# mount -t lustre /dev/VG1/mdt /mnt/mdt
fs-mgs-002:~# lctl dl
   0 UP mgs MGS MGS 5
   1 UP mgc MGC192.168.21.32 at tcp 82b34916-ed89-f5b9-026e-7f8e1370765f 5
   2 UP mdt MDS MDS_uuid 3
   3 UP lov datafs-mdtlov datafs-mdtlov_UUID 4
   4 UP mds datafs-MDT0000 datafs-MDT0000_UUID 3

Missing the OSTs here, so I (try to..) remount these too

fs-ost-001:~# umount /mnt/ost/
fs-ost-001:~# mount -t lustre /dev/VG1/ost1 /mnt/ost/
mount.lustre: mount /dev/mapper/VG1-ost1 at /mnt/ost failed: No such 
device or address
The target service failed to start (bad config log?) 
(/dev/mapper/VG1-ost1).  See /var/log/messages.

After this I can only get back to a running state by umounting 
everything on the mgs-002, and remount on the mgs-001
What am I missing here?? Am I messing things up by creating two mgs, one 
on each mgs node?

Leen

On 05/20/2010 03:40 PM, Gabriele Paciucci wrote:
> For a clearification in a two servers configuration:
>
> server1 ->  192.168.2.20 MGS+MDT+OST0
> server2 ->  192.168.2.22 OST1
> /dev/sdb is a lun shared between server1 and server 2
>
> from server1: mkfs.lustre --mgs --failnode=192.168.2.22 --reformat /dev/sdb1
> from server1: mkfs.lustre  --reformat --mdt --mgsnode=192.168.2.20
> --fsname=prova --failover=192.168.2.22 /dev/sdb4
> from server1: mkfs.lustre  --reformat --ost --mgsnode=192.168.2.20
> --failover=192.168.2.22 --fsname=prova /dev/sdb2
> from server2: mkfs.lustre  --reformat --ost --mgsnode=192.168.2.20
> --failover=192.168.2.20 --fsname=prova /dev/sdb3
>
>
> from server1: mount -t lustre /dev/sdb1 /lustre/mgs_prova
> from server1: mount -t lustre /dev/sdb4 /lustre/mdt_prova
> from server1: mount -t lustre /dev/sdb2 /lustre/ost0_prova
> from server2: mount -t lustre /dev/sdb3 /lustre/ost1_prova
>
>
> from client:
> modprobe lustre
> mount -t lustre 192.168.2.20 at tcp:192.168.2.22 at tcp:/prova /prova
>
> now halt server1 and mount MGS, MDT and OST0 on server2, the client
> should continue the activity without problem
>
>
>
> On 05/20/2010 02:55 PM, Kevin Van Maren wrote:
>    
>> leen smit wrote:
>>
>>      
>>> Ok, no VIP's then.. But how does failover work in lustre then?
>>> If I setup everything using the real IP and then mount from a client and
>>> bring down the active MGS, the client will just sit there until it comes
>>> back up again.
>>> As in, there is no failover to the second node.  So how does this
>>> internal lustre failover mechanism work?
>>>
>>> I've been going trought the docs, and I must say there is very little on
>>> the failover mechanism, apart from mentions that a seperate app should
>>> care of that. Thats the reason I'm implementing keepalived..
>>>
>>>
>>>        
>> Right: the external service needs to keep the "mount" active/healthy on
>> one of the servers.
>> Lustre handles reconnecting clients/servers as long as the volume is
>> mounted where it expects
>> (ie, the mkfs node or the --failover node).
>>
>>      
>>> At this stage I really am clueless, and can only think of creating a TUN
>>> interface, which will have the VIP address (thus, it becomes a real IP,
>>> not just a VIP).
>>> But I got a feeling that ain't the right approach either...
>>> Is there any docs available where a active/passive MGS setup is described?
>>> Is it sufficient to define a --failnode=nid,...  at creation time?
>>>
>>>
>>>        
>> Yep.  See Johann's email on the MGS, but for the MDTs and OSTs that's
>> all you have to do
>> (besides listing both MGS NIDs at mkfs time).
>>
>> For the clients, you specify both MGS NIDs at mount time, so it can
>> mount regardless of which
>> node has the active MGS.
>>
>> Kevin
>>
>>
>>      
>>> Any help would be greatly appreciated!
>>>
>>> Leen
>>>
>>>
>>> On 05/20/2010 01:45 PM, Brian J. Murrell wrote:
>>>
>>>
>>>        
>>>> On Thu, 2010-05-20 at 12:46 +0200, leen smit wrote:
>>>>
>>>>
>>>>
>>>>          
>>>>> Keepalive uses a VIP in a active/passive state. In a failover situation
>>>>> the VIP gets transferred to the passive one.
>>>>>
>>>>>
>>>>>
>>>>>            
>>>> Don't use virtual IPs with Lustre.  Lustre clients know how to deal with
>>>> failover nodes that have different IP addresses and using a virtual,
>>>> floating IP address will just confuse it.
>>>>
>>>> b.
>>>>
>>>>
>>>>
>>>>
>>>>          
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>
>>>
>>>        
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>>      
>
>