[lustre-discuss] [EXTERNAL] Re: Disk quota exceeded while quota is not filled
David Cohen
cdavid at physics.technion.ac.il
Wed Aug 26 08:41:02 PDT 2020
Thank you Chad for answering,
We are using the patched kernel on the MDT/OSS
The problem is in the group space quota.
In any case I enabled project quota just for future purposes.
There are no defined projects, do you think it can still pose a problem?
Best,
David
On Wed, Aug 26, 2020 at 3:18 PM Chad DeWitt <ccdewitt at uncc.edu> wrote:
> Hi David,
>
> Hope you're doing well.
>
> This is a total shot in the dark, but depending on the kernel version you
> are running, you may need a patched kernel to use project quotas. I'm not
> sure what the symptoms would be, but it may be worth turning off project
> quotas and seeing if doing so resolves your issue:
>
> lctl conf_param technion.quota.mdt=none
> lctl conf_param technion.quota.mdt=ug
> lctl conf_param technion.quota.ost=none
> lctl conf_param technion.quota.ost=ug
>
> (Looks like you have been running project quota on your MDT for a while
> without issue, so this may be a deadend.)
>
> Here's more info concerning when a patched kernel is necessary for
> project quotas (25.2. Enabling Disk Quotas):
>
> http://doc.lustre.org/lustre_manual.xhtml
>
>
> Cheers,
> Chad
>
> ------------------------------------------------------------
>
> Chad DeWitt, CISSP | University Research Computing
>
> UNC Charlotte *| *Office of OneIT
>
> ccdewitt at uncc.edu
>
> ------------------------------------------------------------
>
>
>
> On Tue, Aug 25, 2020 at 3:04 AM David Cohen <cdavid at physics.technion.ac.il>
> wrote:
>
>> [*Caution*: Email from External Sender. Do not click or open links or
>> attachments unless you know this sender.]
>>
>> Hi,
>> Still hoping for a reply...
>>
>> It seems to me that old groups are more affected by the issue than new
>> ones that were created after a major disk migration.
>> It seems that the quota enforcement is somehow based on a counter other
>> than the accounting as the accounting produces the same numbers as du.
>> So if quota is calculated separately from accounting, it is possible that
>> quota is broken and keeps values from removed disks, while accounting is
>> correct.
>> So following that suspicion I tried to force the FS to recalculate quota.
>> I tried:
>> lctl conf_param technion.quota.ost=none
>> and back to:
>> lctl conf_param technion.quota.ost=ugp
>>
>> I tried running on mds and all ost:
>> tune2fs -O ^quota
>> and on again:
>> tune2fs -O quota
>> and after each attempt, also:
>> lctl lfsck_start -A -t all -o -e continue
>>
>> But still the problem persists and groups under the quota usage get
>> blocked with "quota exceeded"
>>
>> Best,
>> David
>>
>>
>> On Sun, Aug 16, 2020 at 8:41 AM David Cohen <
>> cdavid at physics.technion.ac.il> wrote:
>>
>>> Hi,
>>> Adding some more information.
>>> A Few months ago the data on the Lustre fs was migrated to new physical
>>> storage.
>>> After successful migration the old ost were marked as active=0
>>> (lctl conf_param technion-OST0001.osc.active=0)
>>>
>>> Since then all the clients were unmounted and mounted.
>>> tunefs.lustre --writeconf was executed on the mgs/mdt and all the ost.
>>> lctl dl don't show the old ost anymore, but when querying the quota they
>>> still appear.
>>> As I see that new users are less affected by the "quota exceeded"
>>> problem (blocked from writing while quota is not filled),
>>> I suspect that quota calculation is still summing values from the old
>>> ost:
>>>
>>> *lfs quota -g -v md_kaplan /storage/*
>>> Disk quotas for grp md_kaplan (gid 10028):
>>> Filesystem kbytes quota limit grace files quota limit
>>> grace
>>> /storage/ 4823987000 0 5368709120 - 143596 0
>>> 0 -
>>> technion-MDT0000_UUID
>>> 37028 - 0 - 143596 - 0
>>> -
>>> quotactl ost0 failed.
>>> quotactl ost1 failed.
>>> quotactl ost2 failed.
>>> quotactl ost3 failed.
>>> quotactl ost4 failed.
>>> quotactl ost5 failed.
>>> quotactl ost6 failed.
>>> quotactl ost7 failed.
>>> quotactl ost8 failed.
>>> quotactl ost9 failed.
>>> quotactl ost10 failed.
>>> quotactl ost11 failed.
>>> quotactl ost12 failed.
>>> quotactl ost13 failed.
>>> quotactl ost14 failed.
>>> quotactl ost15 failed.
>>> quotactl ost16 failed.
>>> quotactl ost17 failed.
>>> quotactl ost18 failed.
>>> quotactl ost19 failed.
>>> quotactl ost20 failed.
>>> technion-OST0015_UUID
>>> 114429464* - 114429464 - - -
>>> - -
>>> technion-OST0016_UUID
>>> 92938588 - 92938592 - - -
>>> - -
>>> technion-OST0017_UUID
>>> 128496468* - 128496468 - - -
>>> - -
>>> technion-OST0018_UUID
>>> 191478704* - 191478704 - - -
>>> - -
>>> technion-OST0019_UUID
>>> 107720552 - 107720560 - - -
>>> - -
>>> technion-OST001a_UUID
>>> 165631952* - 165631952 - - -
>>> - -
>>> technion-OST001b_UUID
>>> 460714156* - 460714156 - - -
>>> - -
>>> technion-OST001c_UUID
>>> 157182900* - 157182900 - - -
>>> - -
>>> technion-OST001d_UUID
>>> 102945952* - 102945952 - - -
>>> - -
>>> technion-OST001e_UUID
>>> 175840980* - 175840980 - - -
>>> - -
>>> technion-OST001f_UUID
>>> 142666872* - 142666872 - - -
>>> - -
>>> technion-OST0020_UUID
>>> 188147548* - 188147548 - - -
>>> - -
>>> technion-OST0021_UUID
>>> 125914240* - 125914240 - - -
>>> - -
>>> technion-OST0022_UUID
>>> 186390800* - 186390800 - - -
>>> - -
>>> technion-OST0023_UUID
>>> 115386876 - 115386884 - - -
>>> - -
>>> technion-OST0024_UUID
>>> 127139556* - 127139556 - - -
>>> - -
>>> technion-OST0025_UUID
>>> 179666580* - 179666580 - - -
>>> - -
>>> technion-OST0026_UUID
>>> 147837348 - 147837356 - - -
>>> - -
>>> technion-OST0027_UUID
>>> 129823528 - 129823536 - - -
>>> - -
>>> technion-OST0028_UUID
>>> 158270776 - 158270784 - - -
>>> - -
>>> technion-OST0029_UUID
>>> 168762120 - 168763104 - - -
>>> - -
>>> technion-OST002a_UUID
>>> 164235684 - 164235688 - - -
>>> - -
>>> technion-OST002b_UUID
>>> 147512200 - 147512204 - - -
>>> - -
>>> technion-OST002c_UUID
>>> 158046652 - 158046668 - - -
>>> - -
>>> technion-OST002d_UUID
>>> 199314048* - 199314048 - - -
>>> - -
>>> technion-OST002e_UUID
>>> 209187196* - 209187196 - - -
>>> - -
>>> technion-OST002f_UUID
>>> 162586732 - 162586764 - - -
>>> - -
>>> technion-OST0030_UUID
>>> 131248812* - 131248812 - - -
>>> - -
>>> technion-OST0031_UUID
>>> 134665176* - 134665176 - - -
>>> - -
>>> technion-OST0032_UUID
>>> 149767512* - 149767512 - - -
>>> - -
>>> Total allocated inode limit: 0, total allocated block limit: 4823951056
>>> Some errors happened when getting quota info. Some devices may be not
>>> working or deactivated. The data in "[]" is inaccurate.
>>>
>>>
>>> *lfs quota -g -h md_kaplan /storage/*
>>> Disk quotas for grp md_kaplan (gid 10028):
>>> Filesystem used quota limit grace files quota limit
>>> grace
>>> /storage/ 4.493T 0k 5T - 143596 0 0
>>> -
>>>
>>>
>>>
>>> On Tue, Aug 11, 2020 at 7:35 AM David Cohen <
>>> cdavid at physics.technion.ac.il> wrote:
>>>
>>>> Hi,
>>>> I'm running Lustre 2.10.5 on the oss and mds, and 2.10.7 on the clients.
>>>> While inode quota ons mdt worked fine for a while now:
>>>> lctl conf_param technion.quota.mdt=ugp
>>>> When, few days ago I turned on quota on ost:
>>>> lctl conf_param technion.quota.ost=ugp
>>>> Users started getting "Disk quota exceeded" error messages while quota
>>>> is not filled
>>>>
>>>> Actions taken:
>>>> Full e2fsck -f -y to all the file system, mdt and ost.
>>>> lctl lfsck_start -A -t all -o -e continue
>>>> turning quota to none and back.
>>>>
>>>> None of the above solved the problem.
>>>>
>>>> lctl lfsck_query
>>>>
>>>>
>>>> layout_mdts_init: 0
>>>> layout_mdts_scanning-phase1: 0
>>>> layout_mdts_scanning-phase2: 0
>>>> layout_mdts_completed: 0
>>>> layout_mdts_failed: 0
>>>> layout_mdts_stopped: 0
>>>> layout_mdts_paused: 0
>>>> layout_mdts_crashed: 0
>>>> *layout_mdts_partial: 1 *# is that normal output?
>>>> layout_mdts_co-failed: 0
>>>> layout_mdts_co-stopped: 0
>>>> layout_mdts_co-paused: 0
>>>> layout_mdts_unknown: 0
>>>> layout_osts_init: 0
>>>> layout_osts_scanning-phase1: 0
>>>> layout_osts_scanning-phase2: 0
>>>> layout_osts_completed: 30
>>>> layout_osts_failed: 0
>>>> layout_osts_stopped: 0
>>>> layout_osts_paused: 0
>>>> layout_osts_crashed: 0
>>>> layout_osts_partial: 0
>>>> layout_osts_co-failed: 0
>>>> layout_osts_co-stopped: 0
>>>> layout_osts_co-paused: 0
>>>> layout_osts_unknown: 0
>>>> layout_repaired: 15
>>>> namespace_mdts_init: 0
>>>> namespace_mdts_scanning-phase1: 0
>>>> namespace_mdts_scanning-phase2: 0
>>>> namespace_mdts_completed: 1
>>>> namespace_mdts_failed: 0
>>>> namespace_mdts_stopped: 0
>>>> namespace_mdts_paused: 0
>>>> namespace_mdts_crashed: 0
>>>> namespace_mdts_partial: 0
>>>> namespace_mdts_co-failed: 0
>>>> namespace_mdts_co-stopped: 0
>>>> namespace_mdts_co-paused: 0
>>>> namespace_mdts_unknown: 0
>>>> namespace_osts_init: 0
>>>> namespace_osts_scanning-phase1: 0
>>>> namespace_osts_scanning-phase2: 0
>>>> namespace_osts_completed: 0
>>>> namespace_osts_failed: 0
>>>> namespace_osts_stopped: 0
>>>> namespace_osts_paused: 0
>>>> namespace_osts_crashed: 0
>>>> namespace_osts_partial: 0
>>>> namespace_osts_co-failed: 0
>>>> namespace_osts_co-stopped: 0
>>>> namespace_osts_co-paused: 0
>>>> namespace_osts_unknown: 0
>>>> namespace_repaired: 99
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200826/d1216f67/attachment-0001.html>
More information about the lustre-discuss
mailing list