[Lustre-discuss] Lustre error: ll_inode_revalidate_fini failure -43

Hendelman, Rob Rob.Hendelman at magnetar.com
Thu Dec 18 14:11:35 PST 2008


>> I have seen this error on several clients:
>> 
>> LustreError: 6200:0:(file.c:2513:ll_inode_revalidate_fini()) failure -43 inode 36134281
>
>-EIDRM
>

Not sure what EIDRM is exactly, but I'm guessing it refers to your message here:
http://lists.lustre.org/pipermail/lustre-discuss/2008-February/006593.html

>> How can I find the actual file with this inode on the filesystem to see what uid number is assigned to it?  The mgs/mds/oss/clients all should have the same UID info using NIS.
>
>Have you read the ops manual with regard to l_getgroups?  If not, please
>do that and see if you have any more questions.

I would have never really guessed to search for this since the error message talks about an inode.  I was searching for inode information in the lustre manual.  I wanted to at least identify the file attached to the inode that is having the problem (invalid/unknown uid/gid).

from 32.5.9:
My /proc/fs/lustre/mds/{mdtname}/group_upcall points to /usr/sbin/l_getgroups
My /proc/fs/lustre/mds/{mdtname}/group_info is empty

grepping /var/log/* for l_getgroups gets me a bunch of messages "no such user 113".

Running find on the client against the lustre mountpoint (on the client) to find files owned by user 113 doesn't return anything.  Doing a test find (not using lfs find, but the local find) seems to find files by a UID I specify.  uid 113 on the client is "nagios".  Nobody should be logging in as nagios since nagios is only used to run the nrpe daemon.  gid 113 on the client is smmta.

The only reason I can speculate this happening is that our nagios box is talking to the lustre client to check free space on the mountpoint and when it tries to access the mountpoint it gets the error mentioned in the above thread.

Actually, I just temporarily gave nagios user a shell and su'd to nagios. After I try to do an ls in /path/to/lustremntpoint I get the "identifier removed" error as shown in the Feb 2008 thread.

I'm guessing the correct solution is to add a local nagios user with uid/gid on the mds with the 113 uid.  Do I also need to do this for the mgs (in my case the same box, but it would be good to know for the future) and the oss's ?

Thanks for shedding some light on this.  The key was the l_getgroup you mentioned.  After that it seems a lot of things clicked into place.

Best regards,
Robert

The information contained in this message and its attachments 
is intended only for the private and confidential use of the 
intended recipient(s).  If you are not the intended recipient 
(or have received this e-mail in error) please notify the 
sender immediately and destroy this e-mail. Any unauthorized 
copying, disclosure or distribution of the material in this e-
mail is strictly prohibited.



More information about the lustre-discuss mailing list