[lustre-discuss] Lustre quota problems

Fredrik Nyström freny at nsc.liu.se
Wed Mar 12 14:22:27 PDT 2025


Hi,

We had some similar problems in Sep-Oct 2024 running Lustre 2.15.5.

Limits on individual OSTs stops increasing, leading to writes becoming 
slower and slower?

Check for "DQACQ failed" in /var/log/messages on Lustre servers.

Example, lots of lines like these for all OSTs:
2024-09-04T12:57:15.725917+02:00 oss170 kernel: LustreError: 1059853:0:(qsd_handler.c:340:qsd_req_completion()) $$$ DQACQ failed with -3, flags:0x1  qsd:rossby27-OST0003 qtype:grp id:8517 enforced:1 granted: 285682 pending:149320 waiting:13252405 req:1 usage: 85560 qunit:262144 qtune:65536 edquot:0 default:no
2024-09-04T12:57:15.726112+02:00 oss170 kernel: LustreError: 1059853:0:(qsd_handler.c:787:qsd_op_begin0()) $$$ ID isn't enforced on master, it probably due to a legeal race, if this message is showing up constantly, there could be some inconsistence between master & slave, and quota reintegration needs be re-triggered.  qsd:rossby27-OST0003 qtype:grp id:8517 enforced:1 granted: 285682 pending:149320 waiting:12591294 req:0 usage: 85560 qunit:262144 qtune:65536 edquot:0 default:no
2024-09-04T12:57:15.726138+02:00 oss170 kernel: LustreError: 1059853:0:(qsd_handler.c:787:qsd_op_begin0()) Skipped 20 previous similar messages

If I remember correctly, group quota problems only affected a single 
group. Ok after restart of Lustre servers, unmount of MDT triggered 
kernel panic.

Kind Regards / Fredrik Nyström, NSC

On 2025-03-11 16:12, Robert Pennington wrote:
> Hello,
> 
> We?re using Lustre 2.15.3 and have a strange problem with our attempt to impose quotas. Any assistance would be helpful.
> 
> This is a user who we?ve attempted to impose a group quota on - however, many of our OSTs (ignore the ones where quotactl failed) just don?t receive the quota information.
> 
> On the OSTs with limit=0k below, they have the following configuration, while the nodes like OST000f have correctly-updating information for limit_group:
> 
> $ sudo lctl get_param osd-*.*.quota_slave_dt.info
> osd-ldiskfs.lustre-OST000e.quota_slave_dt.info=
> target name:    lustre-OST000e
> pool ID:        0
> type:           dt
> quota enabled:  ugp
> conn to master: setup
> space acct:     ugp
> user uptodate:  glb[0],slv[0],reint[0]
> group uptodate: glb[0],slv[0],reint[0]
> project uptodate: glb[0],slv[0],reint[0]
> 
>  
> $ sudo lctl get_param osd-*.*.quota_slave_dt.limit_group
> osd-ldiskfs.lustre-OST000e.quota_slave_dt.limit_group=
> global_index_copy:
> - id:      0
>   limits:  { hard:                    0, soft:                    0, granted:                    0, time:                    0 }
> 
> ?
> 
> # lfs quota -vhg 4055 /mnt/lustre/
> Disk quotas for grp 4055 (gid 4055):
>      Filesystem    used   quota   limit   grace   files   quota   limit   grace
>    /mnt/lustre/  2.694T*     0k  716.8G       -  145961       0       0       -
> lustre-MDT0000_UUID
>                  76.81M       -  1.075G       -  145951       -       0       -
> lustre-MDT0001_UUID
>                     40k*      -     40k       -      10       -       0       -
> quotactl ost0 failed.
> lustre-OST0001_UUID
>                  16.09G       -      0k       -       -       -       -       -
> lustre-OST0002_UUID
>                  251.7G       -      0k       -       -       -       -       -
> quotactl ost3 failed.
> quotactl ost4 failed.
> quotactl ost5 failed.
> quotactl ost6 failed.
> lustre-OST0007_UUID
>                      0k       -      0k       -       -       -       -       -
> lustre-OST0008_UUID
>                  525.1M*      -  525.1M       -       -       -       -       -
> lustre-OST0009_UUID
>                  540.7M*      -  540.7M       -       -       -       -       -
> lustre-OST000a_UUID
>                  385.8M*      -  385.8M       -       -       -       -       -
> quotactl ost11 failed.
> quotactl ost12 failed.
> lustre-OST000d_UUID
>                  191.9G       -      0k       -       -       -       -       -
> lustre-OST000e_UUID
>                  258.9G       -      0k       -       -       -       -       -
> lustre-OST000f_UUID
>                  86.99G       -  87.99G       -       -       -       -       -
> lustre-OST0010_UUID
>                  255.3G       -  256.3G       -       -       -       -       -
> lustre-OST0011_UUID
>                  254.1G       -      0k       -       -       -       -       -
> lustre-OST0012_UUID
>                  241.6G       -      0k       -       -       -       -       -
> lustre-OST0013_UUID
>                  241.6G       -      0k       -       -       -       -       -
> lustre-OST0014_UUID
>                  241.9G       -      0k       -       -       -       -       -
> lustre-OST0015_UUID
>                  237.4G       -      0k       -       -       -       -       -
> lustre-OST0016_UUID
>                  241.8G       -      0k       -       -       -       -       -
> lustre-OST0017_UUID
>                  237.8G       -      0k       -       -       -       -       -
> lustre-OST0018_UUID
>                  344.2M       -      0k       -       -       -       -       -
> Total allocated inode limit: 0, total allocated block limit: 345.7G
> Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate.
> 
> ?
> 
> Thank you for your time.
> 
> Sincerely,
> Robert Pennington, PhD
> 
> Tuebingen AI Center, Universitaet Tuebingen
> Maria von Linden Str. 6
> 72076 Tuebingen
> Germany 
> 
> Office number: 10-30/A15


-- 
Fredrik Nyström, National Supercomputer Centre
freny at nsc.liu.se


More information about the lustre-discuss mailing list