[lustre-discuss] [EXTERNAL] Re: Disk quota exceeded while quota is not filled

David Cohen cdavid at physics.technion.ac.il
Wed Aug 26 08:41:02 PDT 2020


Thank you Chad for answering,
We are using the patched kernel on the MDT/OSS
The problem is in the group space quota.
In any case I enabled project quota just for future purposes.
There are no defined projects, do you think it can still pose a problem?

Best,
David




On Wed, Aug 26, 2020 at 3:18 PM Chad DeWitt <ccdewitt at uncc.edu> wrote:

> Hi David,
>
> Hope you're doing well.
>
> This is a total shot in the dark, but depending on the kernel version you
> are running, you may need a patched kernel to use project quotas. I'm not
> sure what the symptoms would be, but it may be worth turning off project
> quotas and seeing if doing so resolves your issue:
>
> lctl conf_param technion.quota.mdt=none
> lctl conf_param technion.quota.mdt=ug
> lctl conf_param technion.quota.ost=none
> lctl conf_param technion.quota.ost=ug
>
> (Looks like you have been running project quota on your MDT for a while
> without issue, so this may be a deadend.)
>
> Here's more info concerning when a patched kernel is necessary for
> project quotas (25.2.  Enabling Disk Quotas):
>
> http://doc.lustre.org/lustre_manual.xhtml
>
>
> Cheers,
> Chad
>
> ------------------------------------------------------------
>
> Chad DeWitt, CISSP | University Research Computing
>
> UNC Charlotte *| *Office of OneIT
>
> ccdewitt at uncc.edu
>
> ------------------------------------------------------------
>
>
>
> On Tue, Aug 25, 2020 at 3:04 AM David Cohen <cdavid at physics.technion.ac.il>
> wrote:
>
>> [*Caution*: Email from External Sender. Do not click or open links or
>> attachments unless you know this sender.]
>>
>> Hi,
>> Still hoping for a reply...
>>
>> It seems to me that old groups are more affected by the issue than new
>> ones that were created after a major disk migration.
>> It seems that the quota enforcement is somehow based on a counter other
>> than the accounting as the accounting produces the same numbers as du.
>> So if quota is calculated separately from accounting, it is possible that
>> quota is broken and keeps values from removed disks, while accounting is
>> correct.
>> So following that suspicion I tried to force the FS to recalculate quota.
>> I tried:
>> lctl conf_param technion.quota.ost=none
>> and back to:
>> lctl conf_param technion.quota.ost=ugp
>>
>> I tried running on mds and all ost:
>> tune2fs -O ^quota
>> and on again:
>> tune2fs -O quota
>> and after each attempt, also:
>> lctl lfsck_start -A -t all -o -e continue
>>
>> But still the problem persists and groups under the quota usage get
>> blocked with "quota exceeded"
>>
>> Best,
>> David
>>
>>
>> On Sun, Aug 16, 2020 at 8:41 AM David Cohen <
>> cdavid at physics.technion.ac.il> wrote:
>>
>>> Hi,
>>> Adding some more information.
>>> A Few months ago the data on the Lustre fs was migrated to new physical
>>> storage.
>>> After successful migration the old ost were marked as active=0
>>> (lctl conf_param technion-OST0001.osc.active=0)
>>>
>>> Since then all the clients were unmounted and mounted.
>>> tunefs.lustre --writeconf was executed on the mgs/mdt and all the ost.
>>> lctl dl don't show the old ost anymore, but when querying the quota they
>>> still appear.
>>> As I see that new users are less affected by the "quota exceeded"
>>> problem (blocked from writing while quota is not filled),
>>> I suspect that quota calculation is still summing values from the old
>>> ost:
>>>
>>> *lfs quota -g -v md_kaplan /storage/*
>>> Disk quotas for grp md_kaplan (gid 10028):
>>>      Filesystem  kbytes   quota   limit   grace   files   quota   limit
>>>   grace
>>>       /storage/ 4823987000       0 5368709120       -  143596       0
>>>     0       -
>>> technion-MDT0000_UUID
>>>                   37028       -       0       -  143596       -       0
>>>       -
>>> quotactl ost0 failed.
>>> quotactl ost1 failed.
>>> quotactl ost2 failed.
>>> quotactl ost3 failed.
>>> quotactl ost4 failed.
>>> quotactl ost5 failed.
>>> quotactl ost6 failed.
>>> quotactl ost7 failed.
>>> quotactl ost8 failed.
>>> quotactl ost9 failed.
>>> quotactl ost10 failed.
>>> quotactl ost11 failed.
>>> quotactl ost12 failed.
>>> quotactl ost13 failed.
>>> quotactl ost14 failed.
>>> quotactl ost15 failed.
>>> quotactl ost16 failed.
>>> quotactl ost17 failed.
>>> quotactl ost18 failed.
>>> quotactl ost19 failed.
>>> quotactl ost20 failed.
>>> technion-OST0015_UUID
>>>                 114429464*      - 114429464       -       -       -
>>>   -       -
>>> technion-OST0016_UUID
>>>                 92938588       - 92938592       -       -       -
>>> -       -
>>> technion-OST0017_UUID
>>>                 128496468*      - 128496468       -       -       -
>>>   -       -
>>> technion-OST0018_UUID
>>>                 191478704*      - 191478704       -       -       -
>>>   -       -
>>> technion-OST0019_UUID
>>>                 107720552       - 107720560       -       -       -
>>>   -       -
>>> technion-OST001a_UUID
>>>                 165631952*      - 165631952       -       -       -
>>>   -       -
>>> technion-OST001b_UUID
>>>                 460714156*      - 460714156       -       -       -
>>>   -       -
>>> technion-OST001c_UUID
>>>                 157182900*      - 157182900       -       -       -
>>>   -       -
>>> technion-OST001d_UUID
>>>                 102945952*      - 102945952       -       -       -
>>>   -       -
>>> technion-OST001e_UUID
>>>                 175840980*      - 175840980       -       -       -
>>>   -       -
>>> technion-OST001f_UUID
>>>                 142666872*      - 142666872       -       -       -
>>>   -       -
>>> technion-OST0020_UUID
>>>                 188147548*      - 188147548       -       -       -
>>>   -       -
>>> technion-OST0021_UUID
>>>                 125914240*      - 125914240       -       -       -
>>>   -       -
>>> technion-OST0022_UUID
>>>                 186390800*      - 186390800       -       -       -
>>>   -       -
>>> technion-OST0023_UUID
>>>                 115386876       - 115386884       -       -       -
>>>   -       -
>>> technion-OST0024_UUID
>>>                 127139556*      - 127139556       -       -       -
>>>   -       -
>>> technion-OST0025_UUID
>>>                 179666580*      - 179666580       -       -       -
>>>   -       -
>>> technion-OST0026_UUID
>>>                 147837348       - 147837356       -       -       -
>>>   -       -
>>> technion-OST0027_UUID
>>>                 129823528       - 129823536       -       -       -
>>>   -       -
>>> technion-OST0028_UUID
>>>                 158270776       - 158270784       -       -       -
>>>   -       -
>>> technion-OST0029_UUID
>>>                 168762120       - 168763104       -       -       -
>>>   -       -
>>> technion-OST002a_UUID
>>>                 164235684       - 164235688       -       -       -
>>>   -       -
>>> technion-OST002b_UUID
>>>                 147512200       - 147512204       -       -       -
>>>   -       -
>>> technion-OST002c_UUID
>>>                 158046652       - 158046668       -       -       -
>>>   -       -
>>> technion-OST002d_UUID
>>>                 199314048*      - 199314048       -       -       -
>>>   -       -
>>> technion-OST002e_UUID
>>>                 209187196*      - 209187196       -       -       -
>>>   -       -
>>> technion-OST002f_UUID
>>>                 162586732       - 162586764       -       -       -
>>>   -       -
>>> technion-OST0030_UUID
>>>                 131248812*      - 131248812       -       -       -
>>>   -       -
>>> technion-OST0031_UUID
>>>                 134665176*      - 134665176       -       -       -
>>>   -       -
>>> technion-OST0032_UUID
>>>                 149767512*      - 149767512       -       -       -
>>>   -       -
>>> Total allocated inode limit: 0, total allocated block limit: 4823951056
>>> Some errors happened when getting quota info. Some devices may be not
>>> working or deactivated. The data in "[]" is inaccurate.
>>>
>>>
>>> *lfs quota -g -h md_kaplan /storage/*
>>> Disk quotas for grp md_kaplan (gid 10028):
>>>      Filesystem    used   quota   limit   grace   files   quota   limit
>>>   grace
>>>       /storage/  4.493T      0k      5T       -  143596       0       0
>>>       -
>>>
>>>
>>>
>>> On Tue, Aug 11, 2020 at 7:35 AM David Cohen <
>>> cdavid at physics.technion.ac.il> wrote:
>>>
>>>> Hi,
>>>> I'm running Lustre 2.10.5 on the oss and mds, and 2.10.7 on the clients.
>>>> While inode quota ons mdt worked fine for a while now:
>>>> lctl conf_param technion.quota.mdt=ugp
>>>> When, few days ago I turned on quota on ost:
>>>> lctl conf_param technion.quota.ost=ugp
>>>> Users started getting "Disk quota exceeded" error messages while quota
>>>> is not filled
>>>>
>>>> Actions taken:
>>>> Full e2fsck -f -y to all the file system, mdt and ost.
>>>> lctl lfsck_start -A -t all -o -e continue
>>>> turning quota to none and back.
>>>>
>>>> None of the above solved the problem.
>>>>
>>>> lctl lfsck_query
>>>>
>>>>
>>>> layout_mdts_init: 0
>>>> layout_mdts_scanning-phase1: 0
>>>> layout_mdts_scanning-phase2: 0
>>>> layout_mdts_completed: 0
>>>> layout_mdts_failed: 0
>>>> layout_mdts_stopped: 0
>>>> layout_mdts_paused: 0
>>>> layout_mdts_crashed: 0
>>>> *layout_mdts_partial: 1 *# is that normal output?
>>>> layout_mdts_co-failed: 0
>>>> layout_mdts_co-stopped: 0
>>>> layout_mdts_co-paused: 0
>>>> layout_mdts_unknown: 0
>>>> layout_osts_init: 0
>>>> layout_osts_scanning-phase1: 0
>>>> layout_osts_scanning-phase2: 0
>>>> layout_osts_completed: 30
>>>> layout_osts_failed: 0
>>>> layout_osts_stopped: 0
>>>> layout_osts_paused: 0
>>>> layout_osts_crashed: 0
>>>> layout_osts_partial: 0
>>>> layout_osts_co-failed: 0
>>>> layout_osts_co-stopped: 0
>>>> layout_osts_co-paused: 0
>>>> layout_osts_unknown: 0
>>>> layout_repaired: 15
>>>> namespace_mdts_init: 0
>>>> namespace_mdts_scanning-phase1: 0
>>>> namespace_mdts_scanning-phase2: 0
>>>> namespace_mdts_completed: 1
>>>> namespace_mdts_failed: 0
>>>> namespace_mdts_stopped: 0
>>>> namespace_mdts_paused: 0
>>>> namespace_mdts_crashed: 0
>>>> namespace_mdts_partial: 0
>>>> namespace_mdts_co-failed: 0
>>>> namespace_mdts_co-stopped: 0
>>>> namespace_mdts_co-paused: 0
>>>> namespace_mdts_unknown: 0
>>>> namespace_osts_init: 0
>>>> namespace_osts_scanning-phase1: 0
>>>> namespace_osts_scanning-phase2: 0
>>>> namespace_osts_completed: 0
>>>> namespace_osts_failed: 0
>>>> namespace_osts_stopped: 0
>>>> namespace_osts_paused: 0
>>>> namespace_osts_crashed: 0
>>>> namespace_osts_partial: 0
>>>> namespace_osts_co-failed: 0
>>>> namespace_osts_co-stopped: 0
>>>> namespace_osts_co-paused: 0
>>>> namespace_osts_unknown: 0
>>>> namespace_repaired: 99
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200826/d1216f67/attachment-0001.html>


More information about the lustre-discuss mailing list