[lustre-discuss] Lustre quota problems

Jose Manuel Martínez García jose.martinez at scayle.es
Thu Mar 13 02:44:38 PDT 2025


Hi,

I'm not sure if this could be related to the following issue:

http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2025-January/019372.html


It appears to involve a similar Lustre version, quota-related issues, 
and MDT instability.

In the referenced post, they reported that disabling quotas stabilized 
the MDS for about a month.



El 12/03/2025 a las 22:22, Fredrik Nyström via lustre-discuss escribió:
> Hi,
>
> We had some similar problems in Sep-Oct 2024 running Lustre 2.15.5.
>
> Limits on individual OSTs stops increasing, leading to writes becoming
> slower and slower?
>
> Check for "DQACQ failed" in /var/log/messages on Lustre servers.
>
> Example, lots of lines like these for all OSTs:
> 2024-09-04T12:57:15.725917+02:00 oss170 kernel: LustreError: 1059853:0:(qsd_handler.c:340:qsd_req_completion()) $$$ DQACQ failed with -3, flags:0x1  qsd:rossby27-OST0003 qtype:grp id:8517 enforced:1 granted: 285682 pending:149320 waiting:13252405 req:1 usage: 85560 qunit:262144 qtune:65536 edquot:0 default:no
> 2024-09-04T12:57:15.726112+02:00 oss170 kernel: LustreError: 1059853:0:(qsd_handler.c:787:qsd_op_begin0()) $$$ ID isn't enforced on master, it probably due to a legeal race, if this message is showing up constantly, there could be some inconsistence between master & slave, and quota reintegration needs be re-triggered.  qsd:rossby27-OST0003 qtype:grp id:8517 enforced:1 granted: 285682 pending:149320 waiting:12591294 req:0 usage: 85560 qunit:262144 qtune:65536 edquot:0 default:no
> 2024-09-04T12:57:15.726138+02:00 oss170 kernel: LustreError: 1059853:0:(qsd_handler.c:787:qsd_op_begin0()) Skipped 20 previous similar messages
>
> If I remember correctly, group quota problems only affected a single
> group. Ok after restart of Lustre servers, unmount of MDT triggered
> kernel panic.
>
> Kind Regards / Fredrik Nyström, NSC
>
> On 2025-03-11 16:12, Robert Pennington wrote:
>> Hello,
>>
>> We?re using Lustre 2.15.3 and have a strange problem with our attempt to impose quotas. Any assistance would be helpful.
>>
>> This is a user who we?ve attempted to impose a group quota on - however, many of our OSTs (ignore the ones where quotactl failed) just don?t receive the quota information.
>>
>> On the OSTs with limit=0k below, they have the following configuration, while the nodes like OST000f have correctly-updating information for limit_group:
>>
>> $ sudo lctl get_param osd-*.*.quota_slave_dt.info
>> osd-ldiskfs.lustre-OST000e.quota_slave_dt.info=
>> target name:    lustre-OST000e
>> pool ID:        0
>> type:           dt
>> quota enabled:  ugp
>> conn to master: setup
>> space acct:     ugp
>> user uptodate:  glb[0],slv[0],reint[0]
>> group uptodate: glb[0],slv[0],reint[0]
>> project uptodate: glb[0],slv[0],reint[0]
>>
>>   
>> $ sudo lctl get_param osd-*.*.quota_slave_dt.limit_group
>> osd-ldiskfs.lustre-OST000e.quota_slave_dt.limit_group=
>> global_index_copy:
>> - id:      0
>>    limits:  { hard:                    0, soft:                    0, granted:                    0, time:                    0 }
>>
>> ?
>>
>> # lfs quota -vhg 4055 /mnt/lustre/
>> Disk quotas for grp 4055 (gid 4055):
>>       Filesystem    used   quota   limit   grace   files   quota   limit   grace
>>     /mnt/lustre/  2.694T*     0k  716.8G       -  145961       0       0       -
>> lustre-MDT0000_UUID
>>                   76.81M       -  1.075G       -  145951       -       0       -
>> lustre-MDT0001_UUID
>>                      40k*      -     40k       -      10       -       0       -
>> quotactl ost0 failed.
>> lustre-OST0001_UUID
>>                   16.09G       -      0k       -       -       -       -       -
>> lustre-OST0002_UUID
>>                   251.7G       -      0k       -       -       -       -       -
>> quotactl ost3 failed.
>> quotactl ost4 failed.
>> quotactl ost5 failed.
>> quotactl ost6 failed.
>> lustre-OST0007_UUID
>>                       0k       -      0k       -       -       -       -       -
>> lustre-OST0008_UUID
>>                   525.1M*      -  525.1M       -       -       -       -       -
>> lustre-OST0009_UUID
>>                   540.7M*      -  540.7M       -       -       -       -       -
>> lustre-OST000a_UUID
>>                   385.8M*      -  385.8M       -       -       -       -       -
>> quotactl ost11 failed.
>> quotactl ost12 failed.
>> lustre-OST000d_UUID
>>                   191.9G       -      0k       -       -       -       -       -
>> lustre-OST000e_UUID
>>                   258.9G       -      0k       -       -       -       -       -
>> lustre-OST000f_UUID
>>                   86.99G       -  87.99G       -       -       -       -       -
>> lustre-OST0010_UUID
>>                   255.3G       -  256.3G       -       -       -       -       -
>> lustre-OST0011_UUID
>>                   254.1G       -      0k       -       -       -       -       -
>> lustre-OST0012_UUID
>>                   241.6G       -      0k       -       -       -       -       -
>> lustre-OST0013_UUID
>>                   241.6G       -      0k       -       -       -       -       -
>> lustre-OST0014_UUID
>>                   241.9G       -      0k       -       -       -       -       -
>> lustre-OST0015_UUID
>>                   237.4G       -      0k       -       -       -       -       -
>> lustre-OST0016_UUID
>>                   241.8G       -      0k       -       -       -       -       -
>> lustre-OST0017_UUID
>>                   237.8G       -      0k       -       -       -       -       -
>> lustre-OST0018_UUID
>>                   344.2M       -      0k       -       -       -       -       -
>> Total allocated inode limit: 0, total allocated block limit: 345.7G
>> Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate.
>>
>> ?
>>
>> Thank you for your time.
>>
>> Sincerely,
>> Robert Pennington, PhD
>>
>> Tuebingen AI Center, Universitaet Tuebingen
>> Maria von Linden Str. 6
>> 72076 Tuebingen
>> Germany
>>
>> Office number: 10-30/A15
>
-- 
- no title specified

Jose Manuel Martínez García

Coordinador de Sistemas

Supercomputación de Castilla y León

Tel: 987 293 174

	

	

Edificio CRAI-TIC, Campus de Vegazana, s/n Universidad de León - 24071 
León, España

<https://www.scayle.es/>

Le informamos, como destinatario de este mensaje, que el correo 
electrónico y las comunicaciones por medio de Internet no permiten 
asegurar ni garantizar la confidencialidad de los mensajes transmitidos, 
así como tampoco su integridad o su correcta recepción, por lo que 
SCAYLE no asume responsabilidad alguna por tales circunstancias. Si no 
consintiese en la utilización del correo electrónico o de las 
comunicaciones vía Internet le rogamos nos lo comunique y ponga en 
nuestro conocimiento de manera inmediata. Para más información visite 
nuestro Aviso Legal <https://www.scayle.es/aviso-legal/>.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20250313/aad858b5/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: EBWpDE38U93b3jQ0.png
Type: image/png
Size: 17332 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20250313/aad858b5/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 6j510KsQBLjS09N5.png
Type: image/png
Size: 4610 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20250313/aad858b5/attachment-0003.png>


More information about the lustre-discuss mailing list