<div dir="ltr"><div>Hi,</div><div>Still hoping for a reply...</div><div><br></div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div>It seems to me that old groups are more affected by the issue than new ones that were created after a major disk migration.</div><div>It seems that the <span class="gmail-ng">quota</span> enforcement is somehow based on a counter other than the accounting as the accounting produces the same numbers as du.</div><div>So if quota is calculated separately from accounting, it is possible that quota is broken and keeps values from removed disks, while accounting is correct.<br></div></div></div><div>So following that suspicion I tried to force the FS to recalculate quota.</div><div>I tried:</div><div>lctl conf_param technion.quota.ost=none</div><div>and back to:<br></div><div>lctl conf_param technion.quota.ost=ugp</div><div><br></div><div>I tried running on mds and all ost:</div><div>tune2fs -O ^quota</div><div>and on again:</div><div>tune2fs -O quota</div><div>and after each attempt, also:</div><div>lctl lfsck_start -A -t all -o -e continue</div><div><br></div><div>But still the problem persists and groups under the quota usage get blocked with "quota exceeded"<br></div><div><br></div><div>Best,</div><div>David<br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Aug 16, 2020 at 8:41 AM David Cohen <<a href="mailto:cdavid@physics.technion.ac.il">cdavid@physics.technion.ac.il</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hi,</div><div>Adding some more information.</div><div>A Few months ago the data on the Lustre fs was migrated to new physical storage.</div><div>After successful migration the old ost were marked as active=0</div><div>(lctl conf_param technion-OST0001.osc.active=0)</div><div><br></div><div>Since then all the clients were unmounted and mounted.</div><div>tunefs.lustre --writeconf was executed on the mgs/mdt and all the ost.</div><div>lctl dl don't show the old ost anymore, but when querying the quota they still appear.</div><div>As I see that new users are less affected by the "quota exceeded" problem (blocked from writing while quota is not filled),</div><div>I suspect that quota calculation is still summing values from the old ost:</div><div><br></div><div><b>lfs quota -g -v md_kaplan /storage/</b><br>Disk quotas for grp md_kaplan (gid 10028):<br>     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace<br>      /storage/ 4823987000       0 5368709120       -  143596       0       0       -<br>technion-MDT0000_UUID<br>                  37028       -       0       -  143596       -       0       -<br>quotactl ost0 failed.<br>quotactl ost1 failed.<br>quotactl ost2 failed.<br>quotactl ost3 failed.<br>quotactl ost4 failed.<br>quotactl ost5 failed.<br>quotactl ost6 failed.<br>quotactl ost7 failed.<br>quotactl ost8 failed.<br>quotactl ost9 failed.<br>quotactl ost10 failed.<br>quotactl ost11 failed.<br>quotactl ost12 failed.<br>quotactl ost13 failed.<br>quotactl ost14 failed.<br>quotactl ost15 failed.<br>quotactl ost16 failed.<br>quotactl ost17 failed.<br>quotactl ost18 failed.<br>quotactl ost19 failed.<br>quotactl ost20 failed.<br>technion-OST0015_UUID<br>                114429464*      - 114429464       -       -       -       -       -<br>technion-OST0016_UUID<br>                92938588       - 92938592       -       -       -       -       -<br>technion-OST0017_UUID<br>                128496468*      - 128496468       -       -       -       -       -<br>technion-OST0018_UUID<br>                191478704*      - 191478704       -       -       -       -       -<br>technion-OST0019_UUID<br>                107720552       - 107720560       -       -       -       -       -<br>technion-OST001a_UUID<br>                165631952*      - 165631952       -       -       -       -       -<br>technion-OST001b_UUID<br>                460714156*      - 460714156       -       -       -       -       -<br>technion-OST001c_UUID<br>                157182900*      - 157182900       -       -       -       -       -<br>technion-OST001d_UUID<br>                102945952*      - 102945952       -       -       -       -       -<br>technion-OST001e_UUID<br>                175840980*      - 175840980       -       -       -       -       -<br>technion-OST001f_UUID<br>                142666872*      - 142666872       -       -       -       -       -<br>technion-OST0020_UUID<br>                188147548*      - 188147548       -       -       -       -       -<br>technion-OST0021_UUID<br>                125914240*      - 125914240       -       -       -       -       -<br>technion-OST0022_UUID<br>                186390800*      - 186390800       -       -       -       -       -<br>technion-OST0023_UUID<br>                115386876       - 115386884       -       -       -       -       -<br>technion-OST0024_UUID<br>                127139556*      - 127139556       -       -       -       -       -<br>technion-OST0025_UUID<br>                179666580*      - 179666580       -       -       -       -       -<br>technion-OST0026_UUID<br>                147837348       - 147837356       -       -       -       -       -<br>technion-OST0027_UUID<br>                129823528       - 129823536       -       -       -       -       -<br>technion-OST0028_UUID<br>                158270776       - 158270784       -       -       -       -       -<br>technion-OST0029_UUID<br>                168762120       - 168763104       -       -       -       -       -<br>technion-OST002a_UUID<br>                164235684       - 164235688       -       -       -       -       -<br>technion-OST002b_UUID<br>                147512200       - 147512204       -       -       -       -       -<br>technion-OST002c_UUID<br>                158046652       - 158046668       -       -       -       -       -<br>technion-OST002d_UUID<br>                199314048*      - 199314048       -       -       -       -       -<br>technion-OST002e_UUID<br>                209187196*      - 209187196       -       -       -       -       -<br>technion-OST002f_UUID<br>                162586732       - 162586764       -       -       -       -       -<br>technion-OST0030_UUID<br>                131248812*      - 131248812       -       -       -       -       -<br>technion-OST0031_UUID<br>                134665176*      - 134665176       -       -       -       -       -<br>technion-OST0032_UUID<br>                149767512*      - 149767512       -       -       -       -       -<br>Total allocated inode limit: 0, total allocated block limit: 4823951056<br>Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate.<br></div><div><br></div><div><br></div><div><b>lfs quota -g -h md_kaplan /storage/</b><br>Disk quotas for grp md_kaplan (gid 10028):<br>     Filesystem    used   quota   limit   grace   files   quota   limit   grace<br>      /storage/  4.493T      0k      5T       -  143596       0       0       -<br></div><div><div dir="ltr"><div dir="ltr"><br></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Aug 11, 2020 at 7:35 AM David Cohen <<a href="mailto:cdavid@physics.technion.ac.il" target="_blank">cdavid@physics.technion.ac.il</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi,<br clear="all"><div><div dir="ltr"><div>I'm running Lustre 2.10.5 on the oss and mds, and 2.10.7 on the clients.</div><div>While inode quota ons mdt worked fine for a while now:</div><div>lctl conf_param technion.quota.mdt=ugp</div><div>When, few days ago I turned on quota on ost:</div><div>lctl conf_param technion.quota.ost=ugp</div><div>Users started getting "Disk quota exceeded" error messages while quota is not filled</div><div><br></div><div>Actions taken:</div><div>Full e2fsck -f -y to all the file system, mdt and ost.</div><div>lctl lfsck_start -A -t all -o -e continue</div><div>turning quota to none and back.</div><div><br></div><div>None of the above solved the problem.</div><div><br></div><div>lctl lfsck_query                                                                                                                                   <br></div><div><br></div><div>layout_mdts_init: 0<br>layout_mdts_scanning-phase1: 0<br>layout_mdts_scanning-phase2: 0<br>layout_mdts_completed: 0<br>layout_mdts_failed: 0<br>layout_mdts_stopped: 0<br>layout_mdts_paused: 0<br>layout_mdts_crashed: 0<br><b>layout_mdts_partial: 1 </b># is that normal output?<br>layout_mdts_co-failed: 0<br>layout_mdts_co-stopped: 0<br>layout_mdts_co-paused: 0<br>layout_mdts_unknown: 0<br>layout_osts_init: 0<br>layout_osts_scanning-phase1: 0<br>layout_osts_scanning-phase2: 0<br>layout_osts_completed: 30<br>layout_osts_failed: 0<br>layout_osts_stopped: 0<br>layout_osts_paused: 0<br>layout_osts_crashed: 0<br>layout_osts_partial: 0<br>layout_osts_co-failed: 0<br>layout_osts_co-stopped: 0<br>layout_osts_co-paused: 0<br>layout_osts_unknown: 0<br>layout_repaired: 15<br>namespace_mdts_init: 0<br>namespace_mdts_scanning-phase1: 0<br>namespace_mdts_scanning-phase2: 0<br>namespace_mdts_completed: 1<br>namespace_mdts_failed: 0<br>namespace_mdts_stopped: 0<br>namespace_mdts_paused: 0<br>namespace_mdts_crashed: 0<br>namespace_mdts_partial: 0<br>namespace_mdts_co-failed: 0<br>namespace_mdts_co-stopped: 0<br>namespace_mdts_co-paused: 0<br>namespace_mdts_unknown: 0<br>namespace_osts_init: 0<br>namespace_osts_scanning-phase1: 0<br>namespace_osts_scanning-phase2: 0<br>namespace_osts_completed: 0<br>namespace_osts_failed: 0<br>namespace_osts_stopped: 0<br>namespace_osts_paused: 0<br>namespace_osts_crashed: 0<br>namespace_osts_partial: 0<br>namespace_osts_co-failed: 0<br>namespace_osts_co-stopped: 0<br>namespace_osts_co-paused: 0<br>namespace_osts_unknown: 0<br>namespace_repaired: 99<br><br><br><br><br></div></div></div></div>
</blockquote></div>
</blockquote></div>