[Lustre-discuss] Lustre, NFS and mds_getattr_lock operation

Andreas Dilger andreas.dilger at oracle.com
Fri May 7 11:45:38 PDT 2010


On 2010-05-07, at 05:12, Frederik Ferner wrote:
> Andreas Dilger wrote:
>> On 2010-05-06, at 11:57, Frederik Ferner wrote:
>>> On our Lustre system we are seeing the following error fairly
>>> regularly, so far we have not had complaints from users and have
>>> not noticed any negative effects, but it would still be nice to
>>> understand the errors better. The systems reporting these errors
>>> are NFS exporters for subtrees of the Lustre file system.
>>> One the Lustre client/NFS server:
>>> May  6 14:23:09 i16-storage1 kernel: LustreError: 11-0: an error occurred while communicating with 172.23.68.8 at tcp. The
>>> mds_getattr_lock operation failed with -13
>> 
>> -13 is -EACCESS (per /usr/include/asm-generic/errno-base.h) or
>> equivalent
>> That just means that someone tried to access a file they don't have
>> permission to access.  As to why this is being printed on the console
>> is a bit of a mystery, since I haven't seen anything similar.  I
>> wonder if NFS is going down some obscure code path that is returning
>> the error to the RPC handler instead of stashing this "normal" error
>> code inside the reply.
> 
> It does not happen every time someone tries to access a directory/file they don't have access, i.e. a simple attempt to change into a directory where you don't have enough permissions does not trigger the log entry. I still suspect some of our users/applications is doing something strange but I'm happy to ignore these errors unless some user complains and we can reproduce it.

It would still be good to figure out what is causing it.  If you could accept the performance impact, you could enable more Lustre debugging on the MDS, and then e.g. have a syslog trigger that dumps the kernel debug log when this message is printed:

lctl set_param debug=+rpctrace  # will have minor impact
lctl set_param debug=+entry     # might have significant impact

That said, I'd hate to go chasing a bug in 1.6.x that is fixed in 1.8 already.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.




More information about the lustre-discuss mailing list