[Lustre-discuss] need help debuggin an access permission problem
yong.fan at whamcloud.com
Thu Sep 23 10:07:57 PDT 2010
On 9/23/10 10:03 PM, Tina Friedrich wrote:
> thanks for the answer. I found it in the meantime; one of our ldap
> servers had a wrong size limit entry.
> The logs I had of course already looked at - they didn't yield much in
> terms of why, only what (as in, I could see it was permission errors,
> but they do of course not really tell you why you are getting them.
> There weren't any log entries that hinted at 'size limit exceeded' or
> Still - could someone point me to the bit in the documentation that best
> describes how the MDS queries that sort of information (group/passwd
> info, I mean)? Or how to best test that it's mechanisms are working? For
> example, in this case, I always thought one would only hit the size
> limit if doing a bulk 'transfer' of data, not doing a lookup on one user
> - plus I could do these sort lookups fine on all machines involved
> (against all ldap servers).
The topic about "User/Group Cache Upcall" maybe helpful for you.
For lustre-1.8.x, it is chapter of 28.1; for lustre-2.0.x, it is chapter
> On 23/09/10 11:20, Ashley Pittman wrote:
>> On 23 Sep 2010, at 10:46, Tina Friedrich wrote:
>>> Hello List,
>>> I'm after debugging hints...
>>> I have a couple of users that intermittently get I/O errors when trying
>>> to ls a directory (as in, within half an hour, works -> doesn't work ->
>>> Users/groups are kept in ldap; as far as I can see/check, the ldap
>>> information is consistend everywhere (i.e. no replication failure or
>>> I am trying to figure out what is going on here/where this is going
>>> wrong. Can someone give me a hint on how to debug this? Specifically,
>>> how does the MDS look up this sort of information, could there be a
>>> 'list too long' type of error involved, something like that?
>> Could you give an indication as to the number of files in the directory concerned? What is the full ls command issued (allowing for shell aliases) and in the case where it works is there a large variation in the time it takes when it does work?
>> In terms of debugging it I'd say the log files for the client in question and the MDS would be the most likely place to start.
More information about the lustre-discuss