[Lustre-devel] Moving forward on Quotas

Matthew Ahrens Matthew.Ahrens at sun.com
Wed Jun 4 16:50:54 PDT 2008

Nikita Danilov wrote:
> Matthew Ahrens writes:
>  > Nikita Danilov wrote:
>  > > Jeff Bonwick writes:
>  > >  > I'd suggest working with Matt Ahrens on this.
>  > > 
>  > > Hello,
>  > > 
>  > > we were discussing recently what is needed from the DMU to implement quotas
>  > > and other forms of space accounting. Our basic premise is that it is desirable
>  > > to keep DMU part of the quota support at minimum, and to implement only
>  > > mechanism here, leaving policy to the upper layers.
>  > 
>  > I agree with this premise.  However, your proposed implementation (especially 
>  > the asynchronous update mechanism and associated pending file) seems 
>  > unnecessarily complicated.
>  > 
>  > I would suggest that we simply update a "database" (eg. ZAP object or sparse 
>  > array) of userid -> space usage from syncing context when the space is 
>  > allocated/freed (ie, dsl_dataset_block_{born,kill}).  I believe that the 
>  > problems this presents[*] will be more tractable than the method you outlined.
> Indeed, this solution is much simpler, and it was considered
> initially. I see following drawbacks in it:

Agreed, those are possible drawbacks, depending on the implementation.  For 
example, if the DB object is stored in the user's objset (which is preferable 
for other reasons) then I suspect that the two drawbacks you mention below 
will be no worse than in your proposal.


>      - a notion of a user identifier (or some opaque identifier) has to
>        be introduced in DMU interface. DMU doesn't interpret these
>        identifiers in any way, except for using them as keys in a space
>        usage database. A set of these identifiers has to be passed to
>        every DMU entry point that might result in space allocation (a
>        set is needed because there are group quotas, and to keep
>        interface more or less generic).
>      - an implementation of chown, chgrp, and distributed quota require
>        DMU user to modify this database. Also, an interface to iterate
>        over this database is most likely needed for things like
>        distributed fsck, and user level quote reporting tools. It seems
>        that it would be quite difficult to encapsulate such a database
>        within DMU.
>  > 
>  > --matt
>  > 
>  > [*] eg, if the DB object is stored in the user's objset, then updating it in 
>  > syncing context may be problematic.  if it is stored in the MOS, carrying it 
> The proposal was to update the database in the context of currently open
> transaction group. That is, when transaction group T has just committed,
> commit call-back is invoked and the database is updated in the context
> of some transaction belonging to transaction group T + 2 (T + 1 being in
> sync). It is because of this that pending file has to keep track of
> objects from two last transaction groups.
>  > along when doing snapshot operations will be painful (snapshot, clone, send, 
>  > recv, rollback, etc).
> Nikita.

More information about the lustre-devel mailing list