[Lustre-discuss] 1.8 quotas

Sat Oct 23 01:39:32 PDT 2010

Hello Jason,

please note that it is also possible to enable quotas using lctl and that 
would not be visible using tunefs.lustre. I think the only real option to 
check if quotas are enabled is to check if quota file exist. For an online 
filesystem 'debugfs -c /dev/device' is probably the safest way (there is also 
a 'secret' way how to bind mount the underlying ldiskfs to another directory, 
but I only use that for test filesystems and never in production, as have not 
verified the kernel code path yet).

Either way, you should check for lquota files, such as

root at rhel5-nfs@phys-oss0:~# mount -t ldiskfs /dev/mapper/ost_demofs_2 /mnt

root at rhel5-nfs@phys-oss0:~# ll /mnt
[...]
-rw-r--r-- 1 root root  7168 Oct 23 09:48 lquota_v2.group
-rw-r--r-- 1 root root 71680 Oct 23 09:48 lquota_v2.user

(Of course, you should check that for those OST which have reported the slow 
quota messages).

I just poked around a bit in the code and above the fsfilt_check_slow() check, 
there is also a loop that calls filter_range_is_mapped(). Now this function 
calls fs_bmap() and when that eventually goes to down to ext3, it might get a 
bit slow if, if another thread should modify that file (check out 
linux/fs/inode.c):

/* 
 * bmap() is special.  It gets used by applications such as lilo and by
 * the swapper to find the on-disk block of a specific piece of data.
 *
 * Naturally, this is dangerous if the block concerned is still in the
 * journal.  If somebody makes a swapfile on an ext3 data-journaling
 * filesystem and enables swap, then they may get a nasty shock when the
 * data getting swapped to that swapfile suddenly gets overwritten by
 * the original zero's written out previously to the journal and
 * awaiting writeback in the kernel's buffer cache. 
 *
 * So, if we see any bmap calls here on a modified, data-journaled file,
 * take extra steps to flush any blocks which might be in the cache. 
 */

I don't know though, if it can happen that several threads write to the same 
file. But if it happens, it gets slow. I wonder if a possible swap file is 
worth the  efforts here... In fact, the reason to call 
filter_range_is_mapped() certainly does not require a journal flush in that 
loop. I will check myself next week, if journal flushes are ever made due to 
that and open a Lustre bugzilla then. Avoiding all of that should not be 
difficult

Cheers,
Bernd

On Saturday, October 23, 2010, Jason Hill wrote:
> Kevin/Dave/(and Dave from DDN):
> 
> Thanks for your replies. From tunefs.lustre --dryrun it is very apparent
> that we are not running quotas.
> 
> Thanks for your assistance.
> 
> > That message, from lustre/obdfilter/filter_io_26.c, is the result of the
> > thread taking 35 second
> > from when it entered filter_commitrw_write() until after it called
> > lquota_chkquota() to check the quota.
> > 
> > However, it is certainly plausible that the thread was delayed because
> > of something other than quotas,
> > such as an allocation (eg, it could have been stuck in filter_iobuf_get).
> > 
> > Kevin
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

-- 
Bernd Schubert
DataDirect Networks