[Lustre-discuss] [wc-discuss] can't mount our lustre filesystem after tunefs.lustre --writeconf

Dr Stuart Midgley sdm900 at gmail.com
Sun Mar 18 01:58:57 PDT 2012


Well, our filesystem is back.

I hexedit'ed the CONFIGS/p1-client and replaced prod_mds_001_UUID with p1-MDT0000_UUID and now our file system mounts.

Ran a heap of checks and it all looks good.

Thanks everyone for your help.


--
Dr Stuart Midgley
sdm900 at gmail.com



On 18/03/2012, at 3:36 PM, Stu Midgley wrote:

> I'm well down this path... I replaced the mountdata with that from my
> small temporary mdt (same name) and that didn't help.
> 
> Now, I will do a  few tests on the p1-client.  Perhaps after a write
> conf that is basically clean... and I can replace it... but currently
> it contains lots of info about each of the OST's.
> 
> All the OST's are happy mounting to the mdt and all think that they
> are part of our p1 file system.
> 
> Thanks.
> 
> 
> On Sun, Mar 18, 2012 at 3:04 PM, Kit Westneat <kit.westneat at nyu.edu> wrote:
>> Oh right, that makes sense. I guess if I were you I would try one of two
>> things. First, back up the MDT, and then try:
>> 1) format a small loopback device with the parameters you want the MDT to
>> have, then replace the CONFIGS directory on your MDT with the CONFIGS
>> directory on the loopback device
>> - OR -
>> 2) use a hex editor to modify the UUID
>> 
>> Then use tunefs.lustre --print to make sure it all looks good before
>> mounting it.
>> 
>> Though one thing I wonder about is, are the OSTs on the same page with the
>> fsname? Like are they expecting to be part of the p1 filesystem?
>> 
>> HTH,
>> Kit
>> 
>> --
>> Kit Westneat
>> System Administrator, eSys
>> kit.westneat at nyu.edu
>> 212-992-7647
>> 
>> 
>> On Sun, Mar 18, 2012 at 2:40 AM, Dr Stuart Midgley <sdm900 at gmail.com> wrote:
>>> 
>>> No, we have tried that.
>>> 
>>> This file system started life about 6 years ago as lustre 1.4 and has
>>> continually been upgraded… hence the whacky UUID.  Trying to rename the FS
>>> doesn't work.  It doesn't change the UUID that the mgs tells clients to
>>> mount.
>>> 
>>> 
>>> --
>>> Dr Stuart Midgley
>>> sdm900 at gmail.com
>>> 
>>> 
>>> 
>>> On 18/03/2012, at 2:24 PM, Kit Westneat wrote:
>>> 
>>>> You should be able to reset the UUID by doing another writeconf with the
>>>> --fsname flag. After the writeconf, you'll have to writeconf all the OSTs
>>>> too.
>>>> 
>>>> It worked on my very simple test at least:
>>>> [root at mds1 tmp]# tunefs.lustre --writeconf --fsname=test1 /dev/loop0
>>>> checking for existing Lustre data: found CONFIGS/mountdata
>>>> Reading CONFIGS/mountdata
>>>> 
>>>>    Read previous values:
>>>> Target:     t1-MDT0000
>>>> Index:      0
>>>> Lustre FS:  t1
>>>> Mount type: ldiskfs
>>>> Flags:      0x5
>>>>               (MDT MGS )
>>>> Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro
>>>> Parameters: mdt.group_upcall=/usr/sbin/l_getgroups
>>>> 
>>>> 
>>>>    Permanent disk data:
>>>> Target:     test1-MDT0000
>>>> Index:      0
>>>> Lustre FS:  test1
>>>> Mount type: ldiskfs
>>>> Flags:      0x105
>>>>               (MDT MGS writeconf )
>>>> Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro
>>>> Parameters: mdt.group_upcall=/usr/sbin/l_getgroups
>>>> 
>>>> Writing CONFIGS/mountdata
>>>> 
>>>> 
>>>> HTH,
>>>> Kit
>>>> --
>>>> Kit Westneat
>>>> System Administrator, eSys
>>>> kit.westneat at nyu.edu
>>>> 212-992-7647
>>>> 
>>>> 
>>>> On Sun, Mar 18, 2012 at 1:20 AM, Stu Midgley <sdm900 at gmail.com> wrote:
>>>> ok, from what I can tell, the root of the problem is
>>>> 
>>>> 
>>>> [root at mds001 CONFIGS]# hexdump -C p1-MDT0000  | grep -C 2 mds
>>>> 00002450  0b 00 00 00 04 00 00 00  12 00 00 00 00 00 00 00
>>>>  |................|
>>>> 00002460  70 31 2d 4d 44 54 30 30  30 30 00 00 00 00 00 00
>>>>  |p1-MDT0000......|
>>>> 00002470  6d 64 73 00 00 00 00 00  70 72 6f 64 5f 6d 64 73
>>>>  |mds.....prod_mds|
>>>> 00002480  5f 30 30 31 5f 55 55 49  44 00 00 00 00 00 00 00
>>>>  |_001_UUID.......|
>>>> 00002490  78 00 00 00 07 00 00 00  88 00 00 00 08 00 00 00
>>>>  |x...............|
>>>> --
>>>> 000024c0  00 00 00 00 04 00 00 00  0b 00 00 00 12 00 00 00
>>>>  |................|
>>>> 000024d0  02 00 00 00 0b 00 00 00  70 31 2d 4d 44 54 30 30
>>>>  |........p1-MDT00|
>>>> 000024e0  30 30 00 00 00 00 00 00  70 72 6f 64 5f 6d 64 73
>>>>  |00......prod_mds|
>>>> 000024f0  5f 30 30 31 5f 55 55 49  44 00 00 00 00 00 00 00
>>>>  |_001_UUID.......|
>>>> 00002500  30 00 00 00 00 00 00 00  70 31 2d 4d 44 54 30 30
>>>>  |0.......p1-MDT00|
>>>> 
>>>> [root at mds001 CONFIGS]#
>>>> [root at mds001 CONFIGS]# hexdump -C /mnt/md2/CONFIGS/p1-MDT0000 | grep -C
>>>> 2 mds
>>>> 00002450  0b 00 00 00 04 00 00 00  10 00 00 00 00 00 00 00
>>>>  |................|
>>>> 00002460  70 31 2d 4d 44 54 30 30  30 30 00 00 00 00 00 00
>>>>  |p1-MDT0000......|
>>>> 00002470  6d 64 73 00 00 00 00 00  70 31 2d 4d 44 54 30 30
>>>>  |mds.....p1-MDT00|
>>>> 00002480  30 30 5f 55 55 49 44 00  70 00 00 00 07 00 00 00
>>>>  |00_UUID.p.......|
>>>> 00002490  80 00 00 00 08 00 00 00  00 00 62 10 ff ff ff ff
>>>>  |..........b.....|
>>>> 
>>>> 
>>>> now if only I can get the UUID to be removed or reset...
>>>> 
>>>> 
>>>> On Sun, Mar 18, 2012 at 1:05 PM, Dr Stuart Midgley <sdm900 at gmail.com>
>>>> wrote:
>>>>> hmmm… that didn't work
>>>>> 
>>>>> # tunefs.lustre --force --fsname=p1 /dev/md2
>>>>> checking for existing Lustre data: found CONFIGS/mountdata
>>>>> Reading CONFIGS/mountdata
>>>>> 
>>>>>   Read previous values:
>>>>> Target:     p1-MDT0000
>>>>> Index:      0
>>>>> UUID:       prod_mds_001_UUID
>>>>> Lustre FS:  p1
>>>>> Mount type: ldiskfs
>>>>> Flags:      0x405
>>>>>              (MDT MGS )
>>>>> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
>>>>> Parameters:
>>>>> 
>>>>> tunefs.lustre: unrecognized option `--force'
>>>>> tunefs.lustre: exiting with 22 (Invalid argument)
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Dr Stuart Midgley
>>>>> sdm900 at gmail.com
>>>>> 
>>>>> 
>>>>> 
>>>>> On 18/03/2012, at 12:17 AM, Nathan Rutman wrote:
>>>>> 
>>>>>> Take them all down again, use tunefs.lustre --force --fsname.
>>>>>> 
>>>>>> 
>>>>>> On Mar 17, 2012, at 2:10 AM, "Stu Midgley" <sdm900 at gmail.com> wrote:
>>>>>> 
>>>>>>> Afternoon
>>>>>>> 
>>>>>>> We have a rather severe problem with our lustre file system.  We had
>>>>>>> a
>>>>>>> full config log and the advice was to rewrite it with a new one.
>>>>>>>  So,
>>>>>>> we unmounted our lustre file system off all clients, unmount all the
>>>>>>> ost's and then unmounted the mds.  I then did
>>>>>>> 
>>>>>>> mds:
>>>>>>>  tunefs.lustre --writeconf --erase-params /dev/md2
>>>>>>> 
>>>>>>> oss:
>>>>>>>  tunefs.lustre --writeconf --erase-params --mgsnode=mds001 /dev/md2
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> After the tunefs.lustre on the mds I saw
>>>>>>> 
>>>>>>> Mar 17 14:33:02 mds001 kernel: Lustre: MGS MGS started
>>>>>>> Mar 17 14:33:02 mds001 kernel: Lustre: MGC172.16.0.251 at tcp:
>>>>>>> Reactivating import
>>>>>>> Mar 17 14:33:02 mds001 kernel: Lustre: MGS: Logs for fs p1 were
>>>>>>> removed by user request.  All servers must be restarted in order to
>>>>>>> regenerate the logs.
>>>>>>> Mar 17 14:33:02 mds001 kernel: Lustre: Enabling user_xattr
>>>>>>> Mar 17 14:33:02 mds001 kernel: Lustre: p1-MDT0000: new disk,
>>>>>>> initializing
>>>>>>> Mar 17 14:33:02 mds001 kernel: Lustre: p1-MDT0000: Now serving
>>>>>>> p1-MDT0000 on /dev/md2 with recovery enabled
>>>>>>> 
>>>>>>> which scared me a little...
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> the mds and the oss's mount happily BUT I can't mount the file
>>>>>>> system
>>>>>>> on my clients... on the mds I see
>>>>>>> 
>>>>>>> 
>>>>>>> Mar 17 16:42:11 mds001 kernel: LustreError: 137-5: UUID
>>>>>>> 'prod_mds_001_UUID' is not available  for connect (no target)
>>>>>>> 
>>>>>>> 
>>>>>>> On the client I see
>>>>>>> 
>>>>>>> 
>>>>>>> Mar 17 16:00:06 host kernel: LustreError: 11-0: an error occurred
>>>>>>> while communicating with 172.16.0.251 at tcp. The mds_connect operation
>>>>>>> failed with -19
>>>>>>> 
>>>>>>> 
>>>>>>> now, it appears the writeconf renamed the UUID of the mds from
>>>>>>> prod_mds_001_UUID to p1-MDT0000_UUID but I can't work out how to get
>>>>>>> it back...
>>>>>>> 
>>>>>>> 
>>>>>>> for example I tried
>>>>>>> 
>>>>>>> 
>>>>>>> # tunefs.lustre --mgs --mdt --fsname=p1 /dev/md2
>>>>>>> checking for existing Lustre data: found CONFIGS/mountdata
>>>>>>> Reading CONFIGS/mountdata
>>>>>>> 
>>>>>>> Read previous values:
>>>>>>> Target:     p1-MDT0000
>>>>>>> Index:      0
>>>>>>> UUID:       prod_mds_001_UUID
>>>>>>> Lustre FS:  p1
>>>>>>> Mount type: ldiskfs
>>>>>>> Flags:      0x405
>>>>>>>            (MDT MGS )
>>>>>>> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
>>>>>>> Parameters:
>>>>>>> 
>>>>>>> tunefs.lustre: cannot change the name of a registered target
>>>>>>> tunefs.lustre: exiting with 1 (Operation not permitted)
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> I'm now stuck not being able to mount a 1PB file system... which
>>>>>>> isn't good :(
>>>>>>> 
>>>>>>> --
>>>>>>> Dr Stuart Midgley
>>>>>>> sdm900 at gmail.com
>>>>>> 
>>>>>> ______________________________________________________________________
>>>>>> This email may contain privileged or confidential information, which
>>>>>> should only be used for the purpose for which it was sent by Xyratex. No
>>>>>> further rights or licenses are granted to use such information. If you are
>>>>>> not the intended recipient of this message, please notify the sender by
>>>>>> return and delete it. You may not use, copy, disclose or rely on the
>>>>>> information contained in it.
>>>>>> 
>>>>>> Internet email is susceptible to data corruption, interception and
>>>>>> unauthorised amendment for which Xyratex does not accept liability. While we
>>>>>> have taken reasonable precautions to ensure that this email is free of
>>>>>> viruses, Xyratex does not accept liability for the presence of any computer
>>>>>> viruses in this email, nor for any losses caused as a result of viruses.
>>>>>> 
>>>>>> Xyratex Technology Limited (03134912), Registered in England & Wales,
>>>>>> Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA.
>>>>>> 
>>>>>> The Xyratex group of companies also includes, Xyratex Ltd, registered
>>>>>> in Bermuda, Xyratex International Inc, registered in California, Xyratex
>>>>>> (Malaysia) Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd
>>>>>> registered in The People's Republic of China and Xyratex Japan Limited
>>>>>> registered in Japan.
>>>>>> 
>>>>>> ______________________________________________________________________
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Dr Stuart Midgley
>>>> sdm900 at gmail.com
>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Dr Stuart Midgley
> sdm900 at gmail.com




More information about the lustre-discuss mailing list