[lustre-discuss] MDT quota problem / MDS crash 2.5.3

Gary Molenkamp gary at sharcnet.ca
Tue Jul 19 04:47:34 PDT 2016


We have used this process successfully several times in the past on our 
2.5.3 based system.  However, yesterday this occurred again and there 
were significant errors corrected in the e2fsck stage.   ie:

Inode 1465064204 was part of the orphaned inode list.  FIXED.
Inode 269985884 ref count is 2, should be 1.  Fix? yes
Unattached inode 269985885
Connect to /lost+found? yes
...

Now when I try to mount the MDS/MGS filesystem, I get:

mount.lustre: missing option mgsnode=<nid>

I mounted the filesystem as ldiskfs and it looks ok from my limited 
experience.  I should be able to re-add the nid for the mgs with a :

tunefs.lustre --mgs --mgsnode=<nid> <dev>

correct?  Is there any way to see if there was any other corruption?   
The OSTs are still up and running, do they cache a copy of the mgs data 
for restoration?

Any assistance would be appreciated.

Thanks.
Gary.



On 19/07/16 06:24 AM, Dilger, Andreas wrote:
> On Jul 14, 2016, at 04:13, Thomas Roth <t.roth at gsi.de> wrote:
>> Hi Guido,
>>
>> thanks for the tip, that was successful, with the exact same commands,
>>
>>   tune2fs -O ^quota /dev/mdt     (took about ~3min)
>>   tunefs.lustre --quota /dev/mdt (took about ~30min with ~200M used inodes)
> Note that this can also be achieved by running "e2fsck -f" on the filesystem.
> That is probably faster, and has the added benefit that it verifies the
> consistency of the filesystem before recreating the quota files.
>
> Cheers, Andreas
>
>> A subsequent
>>   lctl lfsck_start -M nyx-MDT0000
>> ran for less than 45 min and seems to have cleaned up the mdt mess.
>>
>> Cheers,
>> Thomas
>>
>>
>> On 07/12/2016 11:18 PM, Guido Laubender wrote:
>>> English version (I'm sorry for my previous mail in German - but should have been a personal mail to Thomas only :( ):
>>>
>>> We were recently able to fix wrong Lustre inode quotas by disabling and re-enabling quota support on the MDT by 'tune2fs -O ^quota /dev/mdt' and
>>> 'tunefs.lustre --quota /dev/mdt'.
>>>
>>> Maybe it helps here as well.
>>>
>>>
>>> On Tue, 12 Jul 2016, Guido Laubender wrote:
>>>
>>>> Bei uns waren vor kurzem die Inode-Quoten nicht korrekt; durch Deaktivieren und anschließendes Aktivieren der Quoten-Unterstützung (mittels 'tune2fs
>>>> -O ^quota' und anschließendem 'tunefs.lustre --quota') auf dem MDT konnten wir es wieder reparieren.
>>>>
>>>> Vielleicht hilft das bei Euch auch...
>>>>
>>>> On Tue, 12 Jul 2016, Thomas Roth wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> we are running Lustre 2.5.3 on our servers. OSTs are on ZFS, MDS is on ldiskfs.
>>>>>
>>>>> After a MDS crash and e2fsck 1.42.9.wc1 on the partition, the MDS mounts but causes high-frequency log entries
>>>>>
>>>>> Jul 12 16:06:38 lxmds12 kernel: VFS: find_free_dqentry(): Data block full but it shouldn't.
>>>>> Jul 12 16:06:38 lxmds12 kernel: VFS: Error -5 occurred while creating quota.
>>>>>
>>>>> interspersed with
>>>>>
>>>>> Jul 12 16:06:38 lxmds12 kernel: LustreError: 13159:0:(qsd_handler.c:1155:qsd_op_adjust()) nyx-MDT0000: fail to locate lqe for id:6763, type:0
>>>>> Jul 12 16:06:38 lxmds12 kernel: LustreError: 13159:0:(qsd_handler.c:1155:qsd_op_adjust()) Skipped 4973 previous similar messages
>>>>>
>>>>> or
>>>>>
>>>>> Jul 12 15:59:26 lxmds12 kernel: LustreError: 13414:0:(qsd_entry.c:211:qsd_refresh_usage()) $$$ failed to read disk usage, rc:-3 qsd:nyx-MDT0000
>>>>> qtype:usr id:7408 enforced:0 granted:0 pending:0 waiting:0 req:0 usage:0 qunit:0 qtune:0 edquot:0
>>>>> Jul 12 15:59:26 lxmds12 kernel: LustreError: 13414:0:(qsd_entry.c:211:qsd_refresh_usage()) Skipped 5166 previous similar messages
>>>>>
>>>>>
>>>>> According to our experience from the last few days, this will eventually bring all Lustre operations to a halt.
>>>>>
>>>>>
>>>>> Both the web and the e2fsck-messages ([QUOTA WARNING] Usage inconsistent for ID 7989:actual (278528000, 738675) != expected (222507008, 531071))
>>>>> hint towards quota issues.
>>>>>
>>>>> Therefore, we have 'switched off' quota by "lctl conf_param fsname.quota.ost|mdt=u|g|ug|none", restarted, umounted and 'switched on' quota again,
>>>>> restarted, unmounted.
>>>>>
>>>>> -> The VSF-Errors still appear.
>>>>>
>>>>> Is there anything else we could do?
>>>>> Mount the MDT as ldiskfs and do nasty things on the disk?
>>>>> Is there any command that recalculates / rewrites the quota files on the MDT?
>>>>>
>>>>>
>>>>>
>>>>> (As long as Lustre is still accessible, 'lfs quota' gives results for both users and groups, but at least the file count is entirely wrong (all of
>>>>> my own Lustre files amount to exactle 0 files).
>>>>>
>>>>> And the update of the usage numbers does not work either - I managed to copy a 1GB-file  and still had the same kbytes used...)
>>>>>
>>>>>
>>>>> Regards,
>>>>> Thomas
>>>>>
>>>>> --
>>>>> --------------------------------------------------------------------
>>>>> Thomas Roth
>>>>> Department: Informationstechnologie
>>>>> Location: SB3 1.250
>>>>> Phone: +49-6159-71 1453  Fax: +49-6159-71 2986
>>>>>
>>>>> GSI Helmholtzzentrum für Schwerionenforschung GmbH
>>>>> Planckstraße 1
>>>>> 64291 Darmstadt
>>>>> www.gsi.de
>>>>>
>>>>> Gesellschaft mit beschränkter Haftung
>>>>> Sitz der Gesellschaft: Darmstadt
>>>>> Handelsregister: Amtsgericht Darmstadt, HRB 1528
>>>>>
>>>>> Geschäftsführung: Ursula Weyrich
>>>>> Professor Dr. Karlheinz Langanke
>>>>> Jörg Blaurock
>>>>>
>>>>> Vorsitzende des Aufsichtsrates: St Dr. Georg Schütte
>>>>> Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
>>>>>
>>>>> _______________________________________________
>>>>> lustre-discuss mailing list
>>>>> lustre-discuss at lists.lustre.org
>>>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>> -- 
>> --------------------------------------------------------------------
>> Thomas Roth
>> Department: HPC
>> Location: SB3 1.262
>> Phone: +49-6159-71 1453  Fax: +49-6159-71 2986
>>
>> GSI Helmholtzzentrum für Schwerionenforschung GmbH
>> Planckstraße 1
>> 64291 Darmstadt
>> www.gsi.de
>>
>> Gesellschaft mit beschränkter Haftung
>> Sitz der Gesellschaft: Darmstadt
>> Handelsregister: Amtsgericht Darmstadt, HRB 1528
>>
>> Geschäftsführung: Professor Dr. Karlheinz Langanke
>> Ursula Weyrich
>> Jörg Blaurock
>>
>> Vorsitzender des Aufsichtsrates: St Dr. Georg Schütte
>> Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



More information about the lustre-discuss mailing list