[Lustre-discuss] Lustre, NFS and mds_getattr_lock operation

Frederik Ferner frederik.ferner at diamond.ac.uk
Wed May 12 03:53:10 PDT 2010


Andreas Dilger wrote:
> On 2010-05-07, at 05:12, Frederik Ferner wrote:
>> Andreas Dilger wrote:
>>> On 2010-05-06, at 11:57, Frederik Ferner wrote:
>>>> On our Lustre system we are seeing the following error fairly 
>>>> regularly, so far we have not had complaints from users and
>>>> have not noticed any negative effects, but it would still be
>>>> nice to understand the errors better. The systems reporting
>>>> these errors are NFS exporters for subtrees of the Lustre file
>>>> system. One the Lustre client/NFS server: May  6 14:23:09
>>>> i16-storage1 kernel: LustreError: 11-0: an error occurred while
>>>> communicating with 172.23.68.8 at tcp. The mds_getattr_lock
>>>> operation failed with -13
>>> -13 is -EACCESS (per /usr/include/asm-generic/errno-base.h) or 
>>> equivalent That just means that someone tried to access a file
>>> they don't have permission to access.  As to why this is being
>>> printed on the console is a bit of a mystery, since I haven't
>>> seen anything similar.  I wonder if NFS is going down some
>>> obscure code path that is returning the error to the RPC handler
>>> instead of stashing this "normal" error code inside the reply.
>> It does not happen every time someone tries to access a
>> directory/file they don't have access, i.e. a simple attempt to
>> change into a directory where you don't have enough permissions
>> does not trigger the log entry. I still suspect some of our
>> users/applications is doing something strange but I'm happy to
>> ignore these errors unless some user complains and we can reproduce
>> it.
> 
> It would still be good to figure out what is causing it.  If you
> could accept the performance impact, you could enable more Lustre
> debugging on the MDS, and then e.g. have a syslog trigger that dumps
> the kernel debug log when this message is printed:
> 
> lctl set_param debug=+rpctrace  # will have minor impact lctl
> set_param debug=+entry     # might have significant impact

We may be able to do that in our next maintenance window beginning of
June. We'll report back.

So far we have not managed to reproduce it on our test file system so we 
can't test there.

> That said, I'd hate to go chasing a bug in 1.6.x that is fixed in 1.8
> already.

Understood, unfortunately we are not really in a position to upgrade to
1.8 any time soon. And as it has not caused any real problem as far as I 
can tell, we are not going to force the upgrade just because of these 
log entries.

Thanks,
Frederik

-- 
Frederik Ferner
Computer Systems Administrator		phone: +44 1235 77 8624
Diamond Light Source Ltd.		mob:   +44 7917 08 5110
(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)



More information about the lustre-discuss mailing list