[Lustre-discuss] Inode errors at time of job failure

Oleg Drokin Oleg.Drokin at Sun.COM
Wed Aug 5 15:59:32 PDT 2009


Hello!

On Aug 5, 2009, at 3:12 PM, Daniel Kulinski wrote:

> What would cause the following error to appear?

Typically this is some sort of a race where you presume an inode exist  
(because you have some traces of it in memory),
but it is not anymore (on mds, anyway). So when client comes to fetch  
inode attributes, there is nothing anymore.
Normally this should not happen because lustre uses locking to ensure  
caching consistency, but in some cases
this is not true (e.g. open returns dentry without lock oftentimes).  
Also if a client was evicted,
cached opened files could not be revoked right away until they are  
closed.

> LustreError: 10991:0:(file.c:2930:ll_inode_revalidate_fini())  
> failure -2 inode 14520180
> This happened at the same time a job failed.  Error number 2 is  
> ENOENT which means that this inode does not exist?

Right.

> Is there a way to query the MDS to find out which file this inode  
> should have belonged to?

Well, there is lfs find that can search by inode number, but since  
there is no such inode anymore, there is no way
to find out to what name it was attached (and the name likely does not  
exist either).

Did you have client eviction before this message by any chance?
What was the job doing at the time?

Bye,
     Oleg



More information about the lustre-discuss mailing list