[Lustre-discuss] MDT Failover not functioning properly with Lustre FS

Chadha, Narjit Narjit.Chadha at necam.com
Fri Mar 7 09:44:56 PST 2008


Thanks for the assistance Klaus. It appears that the primary and
failover mds servers must be separated by a ':' or ',' at least from the
mkfs command on the OSSs:

mkfs.lustre --ost --fsname=mylustre --mgsnode=lustre01:lustre02
--reformat /dev/sdb1

   Permanent disk data:
Target:     mylustre-OSTffff
Index:      unassigned
Lustre FS:  mylustre
Mount type: ldiskfs
Flags:      0x72
              (OST needs_index first_time update )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=192.168.100.1 at tcp,192.168.100.2 at tcp

device size = 35000MB
formatting backing filesystem ldiskfs on /dev/sdb1
        target name  mylustre-OSTffff
        4k blocks     0
        options        -J size=400 -i 16384 -I 256 -q -O dir_index -F
mkfs_cmd = mkfs.ext2 -j -b 4096 -L mylustre-OSTffff  -J size=400 -i
16384 -I 256 -q -O dir_index -F /dev/sdb1
Writing CONFIGS/mountdata

However, when trying to mount the filesystem on the OSS, the following
error occurs:

[root at lustre03 ~]# mount -t lustre /dev/sdb1 /mnt/lustrefs/
mount.lustre: mount /dev/sdb1 at /mnt/lustrefs failed: Input/output
error
Is the MGS running?

The MGS is running on lustre01, but not lustre02 as this is a failover
MGS/MDS node. The MGS will transfer over only at the result of a
failover. From my understanding of Lustre, there can only be 1 MGS/MDS
active per filesystem. Also, if I use the syntax '--mgsnode lustre01
--mgsnode lustre02' this works on the OSSs, but a 'mount -t lustre
/dev/sdb1 /mnt/lustrefs' on the clients will freeze, meaning that the
clients are being confused with 2 MDS nodes.

Regards,

N.

-----Original Message-----
From: Klaus Steden [mailto:klaus.steden at thomson.net] 
Sent: Thursday, March 06, 2008 5:50 PM
To: Chadha, Narjit
Cc: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] MDT Failover not functioning properly with
Lustre FS


Hi Narjit,

My usual syntax with mkfs.lustre/tunefs.lustre is something like this:

tunefs.lustre --erase-params --writeconf --failnode=tm0-1 at tcp
--failnode=tm0-0 at tcp --failnode=172.16.130.249 at tcp1
--failnode=172.16.131.249 at tcp2 --failnode=172.16.130.252 at tcp1
--failnode=172.16.131.252 at tcp2 /dev/sdd

i.e. I use more explicit notation. Specifying '--mgsnode=mgs1
--mgsnode=mgs2' will likely work as you expected.

If you're using bash, the '[]' expansion is usually only suited to
filename
expansion, thusly:

-- cut --
[root at tiger ~]# echo mgs[1-2]
mgs[1-2]
[root at tiger ~]# touch mgs1 mgs2
[root at tiger ~]# echo mgs[1-2]
mgs1 mgs2
[root at tiger ~]# rm mgs[1-2]
rm: remove regular empty file `mgs1'? y
rm: remove regular empty file `mgs2'? y
[root at tiger ~]# touch mgs[1-2]
[root at tiger ~]# ls mgs\[1-2\]
mgs[1-2]
[root at tiger ~]# rm mgs\[1-2\]
rm: remove regular empty file `mgs[1-2]'? y
[root at tiger ~]# touch mgs{1,2}
[root at tiger ~]# ls mgs[1-2]
mgs1  mgs2
[root at tiger ~]# 
-- cut --

Your mileage may vary, but again, using the full notation can save some
confusion over things like this.

cheers,
Klaus

On 3/6/08 1:45 PM, "Chadha, Narjit" <Narjit.Chadha at necam.com>did etch on
stone tablets:

> Hi Klaus,
> 
> Actually mgsnid and mgsnode appear to be interchangeable, but they
> results are the same.
> 
> It is likely that the command lines being used are slightly incorrect
in
> that the [] syntax is getting mangled. I am using bash on Red Hat 5
> though.
> 
> I wonder how the proper mount should look on the clients if I am using
:
> instead of [1-2] to designate the mgsnids(mgsnodes) on the OSSs.
> 
> Regards,
> 
> N. 
> 
> -----Original Message-----
> From: Klaus Steden [mailto:klaus.steden at thomson.net]
> Sent: Thursday, March 06, 2008 2:50 PM
> To: Chadha, Narjit; Andreas Dilger
> Cc: lustre-discuss at lists.lustre.org; Sheila.Barthel at sun.com
> Subject: Re: [Lustre-discuss] MDT Failover not functioning properly
with
> Lustre FS
> 
> 
> Hi Narjit,
> 
> Note that '[]' notation is a shell construct ... depending on the
shell,
> it
> might get expanded different ways, or not at all, and subsequently
> mangled
> by the time it gets to mkfs.lustre.
> 
> The help output for mkfs.lustre on my 1.6.x system also uses '--mgs'
and
> '--mgsnode', but there is no mention of a '--mgsnid' option.
> 
> I find for clarity I use statements like this when working with my
> Lustre FS
> on the client side:
> 
> -- cut --
> mount -t lustre hm0-0 at tcp:hm0-1 at tcp:/lustre /mnt/lustre
> -- cut --
> 
> And on the OSS server side:
> 
> -- cut --
> mount -t lustre /dev/sdi /mnt/lustreost
> -- cut --
> 
> Check your command lines, I think they're slightly incorrect.
> 
> hth,
> Klaus
> 
> On 3/6/08 12:39 PM, "Chadha, Narjit" <Narjit.Chadha at necam.com>did etch
> on
> stone tablets:
> 
>> Would you know the correct MDS mount syntax (for OSTs and Clients)
for
>> an MDS failover?
>> 
>> For the OSSs , it does not appear to take the form:
>> 
>> mkfs.lustre --ost --fsname=lustrefs --mgsnid=mds[1-2] /dev/sdb1 **
>> mount -t lustre /dev/sdb1 /mnt/lustre
>> 
>> ,where mds1,mds2 are the mgsnids of the primary and failover MDSs.
> There
>> is parsing error for mds[1-2] **, but knows of both mds1 and mds2
>> independently.
>> 
>> For the clients, it does not appear to take the form:
>> 
>> mount -t lustre mds[1-2]:/lustrefs /mnt/lustre **
>> 
>> Lustre cannot parse mds[1-2] ** , but knows of both mds1 and mds2
>> independently.
>> 
>> As mentioned, I have tried comma separated names and colon separated
>> names as well with no effect. The problem is with the mgsnid name
>> structure. The ** only show where the errors occur. The MDS on its
own
>> fails over seamlessly and was not mentioned above (it also has the
> same
>> fsname).
>> 
>> Thanks
>> 
>> N.
>> 
>> 
>> -----Original Message-----
>> From: Andreas.Dilger at sun.com [mailto:Andreas.Dilger at sun.com] On
Behalf
>> Of Andreas Dilger
>> Sent: Friday, February 22, 2008 7:08 PM
>> To: Chadha, Narjit
>> Cc: lustre-discuss at lists.lustre.org; Sheila.Barthel at sun.com
>> Subject: Re: [Lustre-discuss] MDT Failover not functioning properly
> with
>> Lustre FS
>> 
>> On Feb 21, 2008  13:14 -0800, Chadha, Narjit wrote:
>>> The only thing left is to be able to mount
>>> the failover mds configuration on the OSS. The sytax:
>>> 
>>>  
>>> 
>>> mkfs.lustre --ost -fsname=mylustre -mgsnid=lustre0[1-2] /dev/sdb1
>> 
>> I think this is a defect in the manual.  It should be "--fsname" and
>> "--mgsnid" I believe.  Please confirm that is the issue and the
manual
>> can be updated.
>> 
>> 
>> 
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Sr. Staff Engineer, Lustre Group
>> Sun Microsystems of Canada, Inc.
>> 
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 




More information about the lustre-discuss mailing list