[Lustre-discuss] Lustre, NFS and mds_getattr_lock operation

Frederik Ferner frederik.ferner at diamond.ac.uk
Wed Sep 8 09:08:48 PDT 2010


Hi Andreas, List,

reviving an old thread now that I managed to look into it during the 
current maintenance window. Note though that we are evaluating an 
upgrade to 1.8.4 in the near future, so I would not call this an high 
priority investigation, I'm partly doing it to see how far I can get, 
partly to make sure it's not hiding a real problem.

Unfortunately I don't really understand the debug logs myself. I've 
attached one of the debug logs that I've created on our MDS after 
running 'lctl set_param debug=+rpctrace' and 'lctl set_param 
debug=+trace', the log has been created as soon as the error message 
appeared in /var/log/messages. Note that I've not managed to set the 
suggested debug flag "+entry", that returned this:

lnet.debug=+entry
error: set_param: writing to file /proc/sys/lnet/debug: Invalid argument

Any help understanding the debug log etc would be much appreciated.

Kind regards,
Frederik

Frederik Ferner wrote:
> Andreas Dilger wrote:
>> On 2010-05-07, at 05:12, Frederik Ferner wrote:
>>> Andreas Dilger wrote:
>>>> On 2010-05-06, at 11:57, Frederik Ferner wrote:
>>>>> On our Lustre system we are seeing the following error fairly 
>>>>> regularly, so far we have not had complaints from users and
>>>>> have not noticed any negative effects, but it would still be
>>>>> nice to understand the errors better. The systems reporting
>>>>> these errors are NFS exporters for subtrees of the Lustre file
>>>>> system. One the Lustre client/NFS server: May  6 14:23:09
>>>>> i16-storage1 kernel: LustreError: 11-0: an error occurred while
>>>>> communicating with 172.23.68.8 at tcp. The mds_getattr_lock
>>>>> operation failed with -13
>>>> -13 is -EACCESS (per /usr/include/asm-generic/errno-base.h) or 
>>>> equivalent That just means that someone tried to access a file
>>>> they don't have permission to access.  As to why this is being
>>>> printed on the console is a bit of a mystery, since I haven't
>>>> seen anything similar.  I wonder if NFS is going down some
>>>> obscure code path that is returning the error to the RPC handler
>>>> instead of stashing this "normal" error code inside the reply.
>>> It does not happen every time someone tries to access a
>>> directory/file they don't have access, i.e. a simple attempt to
>>> change into a directory where you don't have enough permissions
>>> does not trigger the log entry. I still suspect some of our
>>> users/applications is doing something strange but I'm happy to
>>> ignore these errors unless some user complains and we can reproduce
>>> it.
>> It would still be good to figure out what is causing it.  If you
>> could accept the performance impact, you could enable more Lustre
>> debugging on the MDS, and then e.g. have a syslog trigger that dumps
>> the kernel debug log when this message is printed:
>>
>> lctl set_param debug=+rpctrace  # will have minor impact lctl
>> set_param debug=+entry     # might have significant impact
> 
> We may be able to do that in our next maintenance window beginning of
> June. We'll report back.
> 
> So far we have not managed to reproduce it on our test file system so we 
> can't test there.
> 
>> That said, I'd hate to go chasing a bug in 1.6.x that is fixed in 1.8
>> already.
> 
> Understood, unfortunately we are not really in a position to upgrade to
> 1.8 any time soon. And as it has not caused any real problem as far as I 
> can tell, we are not going to force the upgrade just because of these 
> log entries.
> 
> Thanks,
> Frederik
> 


-- 
Frederik Ferner
Computer Systems Administrator		phone: +44 1235 77 8624
Diamond Light Source Ltd.		mob:   +44 7917 08 5110
(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: processing_e_161808534488000.gz
Type: application/x-gzip
Size: 3120614 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100908/80cbef17/attachment.bin>


More information about the lustre-discuss mailing list