[Lustre-discuss] Inode errors at time of job failure
Oleg Drokin
Oleg.Drokin at Sun.COM
Wed Aug 5 15:59:32 PDT 2009
Hello!
On Aug 5, 2009, at 3:12 PM, Daniel Kulinski wrote:
> What would cause the following error to appear?
Typically this is some sort of a race where you presume an inode exist
(because you have some traces of it in memory),
but it is not anymore (on mds, anyway). So when client comes to fetch
inode attributes, there is nothing anymore.
Normally this should not happen because lustre uses locking to ensure
caching consistency, but in some cases
this is not true (e.g. open returns dentry without lock oftentimes).
Also if a client was evicted,
cached opened files could not be revoked right away until they are
closed.
> LustreError: 10991:0:(file.c:2930:ll_inode_revalidate_fini())
> failure -2 inode 14520180
> This happened at the same time a job failed. Error number 2 is
> ENOENT which means that this inode does not exist?
Right.
> Is there a way to query the MDS to find out which file this inode
> should have belonged to?
Well, there is lfs find that can search by inode number, but since
there is no such inode anymore, there is no way
to find out to what name it was attached (and the name likely does not
exist either).
Did you have client eviction before this message by any chance?
What was the job doing at the time?
Bye,
Oleg
More information about the lustre-discuss
mailing list