[Lustre-discuss] very slow directory access (lstat taking >20s to return)

Sun Oct 31 21:33:24 PDT 2010

On 2010-10-29, at 21:20, Daniel Kobras wrote:
> On Fri, Oct 29, 2010 at 09:40:33AM +0100, Frederik Ferner wrote:
>> Doing a 'strace -T -e file ls -n' on one directory with about 750 files, 
>> while users were seeing the hanging ls, showed lstat calls taking 
>> seconds, up to 23s.
> 
> The (l)stat() calls determine the exact size of all files in the displayed directory. This means that each OSTs needs to revoke client write locks for all these files, ie. client-side write caches for all files in the directory are flushed before the (l)stat() returns. This can easily take several seconds if there is heavy write activity on the file.

Actually, unlike most other cluster filesystems Lustre does not need to revoke the OST write locks in order to determine the file size.  The OST extent locks are conditionally revoked if the client is no longer using them, but if they are in use the clients holding those locks only return a "glimpse" of the current file size to the OST, which in turn returns the size to the client doing the (l)stat() call.

Since the (l)stat() call is itself not atomic (i.e. the size may be out-of-date even before the system call returns to userspace even for local filesystems), this glimpse behaviour is ok for (l)stat() calls.  For system calls where the client needs to know the actual file size (e.g. open(O_APPEND) writes, or truncate()) then the client actually does need to get the extent lock that covers the end of the file, and of course it does so.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.