[Lustre-discuss] Inode errors at time of job failure

Oleg Drokin Oleg.Drokin at Sun.COM
Thu Aug 6 22:24:49 PDT 2009


Hello!

On Aug 6, 2009, at 12:57 PM, Thomas Roth wrote:

> Hi,
> these ll_inode_revalidate_fini errors are unfortunately quite known  
> to us.
> So what would you guess if that happens again and again, on a number  
> of
> clients - MDT softly dying away?

No, I do not think this is MDT problem of any sort at present, more
like some strange client interaction.
Are there any negative side effects in your case aside from log clutter?
Jobs failing or anything like that?

> Because we haven't seen any mass evictions (and no reasons for that)  
> in
> connection with these errors.
> Or could the problem with the cached open files also be present if the
> communication interruption does not show up as an eviction in the  
> logs?

It has nothing to do with opened files if there are no evictions.
I checked in bugzilla and found bug 16377 which looks like this report
too. Though the logs in there are somewhat confusing.
It almost appears as if the failing dentry is reported as a mountpoint
by vfs, but then it is not, since following inode_revalidate call
ends up on lustre again.
Do you have "lookup on mtpt" sort of errors coming from namei.c?
If you can reproduce the problem with ls or another tool at will,
can you please execute this on a client (comment #17 in the bug 16377):
# script
Script started, file is typescript
# lctl clear
# echo -1 > /proc/sys/lnet/debug
[ reproduce problem ]
# lctl dk > /tmp/ls.debug
# exit
Script done, file is typescript

and attach your resulting ls.debug in the bug?

Also what lustre version are you using?

Bye,
     Oleg



More information about the lustre-discuss mailing list