[Lustre-discuss] setquota fails, mds adjust qunit failed, quota_interface.c ... quit checking
Thomas Roth
t.roth at gsi.de
Thu Dec 4 09:31:31 PST 2008
Hi,
it seems that with my faulty quota settings I can damage client - OST
connection quite persistently.
As described earlier, I have tried twice to write to Lustre from one
client, producing one file and one "Disk quota exceeded". Afterwards,
this client sets the two OSTs which were affected by my attempts to
"Inactive".
On the OSS I see corresponding log entries:
Dec 4 18:02:31 OSS96 kernel: Lustre:
19425:0:(ldlm_lib.c:760:target_handle_connect()) lust-OST0020: refuse
reconnec
tion from dc6bd83f-6971-7f4d-1a22-77825f6d21a5@[IP]@tcp to
0xffff8101fb223000; still busy with 2 active RPCs
The connection is obviously severed permanently, at least reboot of the
client does not change anything.
Of course "still busy with N active RPCs" is another of the many Lustre
cryptics that merit an explanation or better a repair-recipe from the
experts. Found ever so often in the logs as well as the web but it seems
that waiting for the self-healing of Lustre is all one can do?
Anyway in my case, these are only the problems after the attempt to
write to Lustre. During the attempt, I see on the OST:
Dec 4 18:03:05 lxfs104 kernel: LustreError:
20582:0:(quota_interface.c:473:quota_chk_acq_common()) we meet 10 errors
or run too
many cycles when acquiring quota, quit checking with rc: 0, cycle: 1000.
And the MDS complains as reported before
(quota_master.c:478:mds_quota_adjust()) mds adjust qunit failed!
(opc:4 rc:-16)
Btw, I have checked the output of sysrq-t for mds_set_dqblk or
mds_quota_recovery as suggested by Andrew yet found nothing.
So, what should I do? Unfortunately the system is already in use by
other people, so just starting fresh with a "global mkfs.lustre" is not
an option ;-)
Regards,
Thomas
Thomas Roth wrote:
> Hi,
>
> I'm still having these problems with resetting and setting quota. My
> Lustre system seems to be forever 'setquota failed: Device or resource
> busy'.
> Right now, I have tried to write as much as my current quota setting
> allows:
>
> # lfs quota -u troth /lustre
> Disk quotas for user troth:
> Filesystem kbytes quota limit grace files quota limit
> grace
> /lustre 4 3072000 309200 1 11000 10000
> lust-MDT0000_UUID
> 4* 1 1 6400
> lust-OST0000_UUID
> 0 16384
> lust-OST0001_UUID
> 0 22528
> ...
>
> I wrote some ~ 100 MB with 'dd', deleted them and tried to copy a
> directory - "Disk quota exceeded"
> Now there are several questions: the listing above indicates that on the
> MDT I have exceeded my quota - there's a 4* - without any data in my
> Lustre directory. But this is only 4kB - who nows what could take up
> 4kB. (Another question is how I managed to set the quota on the MDT to 1
> kB in the first place - unfortunately I did not write down my previous
> "lfs setquota" commands while they were still successful.)
> Still - how can I write 1 file with 2MB in this situation, and why can I
> not even make the directory (the one I wanted to copy), without any
> files in it, before the quota blocks everything?
> But wait - the story goes on. When I try to write with dd of=/dev/zero
> ..., the log of the MDT says
>
> Dec 4 14:20:39 lustre kernel: LustreError:
> 3837:0:(quota_master.c:478:mds_quota_adjust()) mds adjust qunit failed!
> (opc:4 rc:-16)
>
> This is reproducible and correlates with my write attempts.
>
> So something might be broken here?
>
> I have read further on in the Lustre Manual about quota. It keeps
> talking about parameters found "/proc/fs/lustre/lquota/..." I don't have
> a subdirectory "lquota" there - neither on the MDT nor on the OSTs. The
> parameters can be found, however, in "/proc/fs/lustre/mds/lust-MDT0000/"
> and "/proc/fs/lustre/obdfilter/lust-OSTxxxx".
> Disturbingly enough, "/proc/fs/lustre/mds/lust-MDT0000/quota_type" reads
> "off2"
> On one OST, I found it to be "off" . There, I tried "tunefs.lustre
> --param ost.quota_type=ug /dev/sdb1 ", as mentioned in the manual.
> Reading the parameters off the partition with tunefs tells me that the
> quota_type is "ug", the entry
> /proc/fs/lustre/mds/lust-MDT0000/quota_type is still "off".
>
>
> Now we have had problems with quotas before, but in these cases already
> "lfs quotacheck" would fail. Now, on this system, not only quotacheck
> worked but while I still had quotas set to sensible values before, the
> quota mechanism itself worked as desired. I conclude that this trouble
> is not because I have forgotten to activate quota in some earlier stage
> as kernel compilation or formatting the Lustre partitions.
>
> So I'm lost now and would appreciate any hint.
>
> Oh, all of these servers are running Debian Etch 64bit, kernel 2.6.22,
> Lustre 1.6.5.1
>
> Thomas
>
> Andrew Perepechko wrote:
>> Thomas,
>>
>> setquota (from quota-tools) would not work with Lustre filesystems, so
>> you cannot run it like "~# setquota -u troth 0 0 0 0 /lustre".
>>
>> lfs can be used either to set quota limits or to reset them and
>> " ~# lfs setquota -u troth 0 0 0 0 /lustre" is the correct way to
>> reset quotas.
>>
>> AFAIU, the cause of "Device or resource busy" when setting quota
>> in your case could be that MDS was performing setquota or quota recovery
>> for the user roth. Could you check whether MDS is stuck inside
>> mds_set_dqblk or mds_quota_recovery functions (you can dump
>> strack traces of running threads into kernel log with alt-sysrq-t provided
>> sysctl variable kerne.sysrq equals 1)?
>>
>> Andrew.
>>
>> On Friday 28 November 2008 17:50:51 Thomas Roth wrote:
>>> Hi all,
>>>
>>> on an empty and unused Lustre 1.6.5.1 system I cannot reset or set the
>>>
>>> quota:
>>> > ~# lfs quota -u troth /lustre
>>> > Disk quotas for user troth:
>>> > Filesystem kbytes quota limit grace files quota
>>>
>>> limit grace
>>>
>>> > /lustre 4 3072000 309200 1 11000 10000
>>> > MDT0000_UUID
>>> > 4* 1 1 6400
>>> > OST0000_UUID
>>> > 0 16384
>>>
>>> Try to reset this quota:
>>> > ~# lfs setquota -u troth 0 0 0 0 /lustre
>>> > setquota failed: Device or resource busy
>>>
>>> Use "some" values instead:
>>> > ~# lfs setquota -u troth 104000000 105000000 100000 100000 /lustre
>>> > setquota failed: Device or resource busy
>>>
>>> I know the manual says not to use "lfs setquota" to reset quotas but -
>>> that is yet another question - of course there is a command "setquota",
>>> but it doesn't know about Lustre
>>>
>>> > ~# setquota -u troth 0 0 0 0 /lustre
>>> > setquota: Mountpoint (or device) /lustre not found.
>>> > setquota: Not all specified mountpoints are using quota.
>>>
>>> as is to be expected. Mistake in the manual?
>>>
>>> However I'm mainly interested in what causes my system to be busy, when
>>> it is not - no writes, not even reads.
>>> I did rerun "lfs quotacheck", but that didn't help, either.
>>>
>>> Anybody got any hints what to do to manipulate quotas?
>>>
>>> Thanks,
>>> Thomas
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
--
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
D-64291 Darmstadt
www.gsi.de
Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528
Geschäftsführer: Professor Dr. Horst Stöcker
Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph,
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
More information about the lustre-discuss
mailing list