[lustre-devel] [PATCH 06/12] lustre: mdc: don't add to page cache upon failure

NeilBrown neilb at suse.com
Mon Nov 26 19:01:30 PST 2018


On Sun, Nov 25 2018, James Simmons wrote:

> From: Lai Siyao <lai.siyao at whamcloud.com>
>
> Reading directory pages may fail on MDS, in this case client should
> not cache a non-up-to-date directory page, because it will cause
> a later read on the same page fail.
>
> Signed-off-by: Lai Siyao <lai.siyao at whamcloud.com>
> WC-bug-id: https://jira.whamcloud.com/browse/LU-5461
> Reviewed-on: http://review.whamcloud.com/11450
> Reviewed-by: Fan Yong <fan.yong at intel.com>
> Reviewed-by: John L. Hammond <jhammond at whamcloud.com>
> Reviewed-by: Oleg Drokin <green at whamcloud.com>
> Signed-off-by: James Simmons <jsimmons at infradead.org>
> ---
>  drivers/staging/lustre/lustre/mdc/mdc_request.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
> index 1072b66..09b30ef 100644
> --- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
> +++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
> @@ -1212,7 +1212,10 @@ static int mdc_read_page_remote(void *data, struct page *page0)
>  	}
>  
>  	rc = mdc_getpage(rp->rp_exp, fid, rp->rp_off, page_pool, npages, &req);
> -	if (!rc) {
> +	if (rc < 0) {
> +		/* page0 is special, which was added into page cache early */
> +		delete_from_page_cache(page0);

This looks wrong to me.  We shouldn't need to delete the page from the
page-cache.
It won't be marked up-to-date, so why will that cause an error on a
later read???

Well, because mdc_page_locate() finds a page and if it isn't up-to-date,
it returns -EIO.  Why does it do that?  If it found a PageError() page,
then it might be reasonable to return -EIO.

Why not just return the page that was found, and let the caller check if
it is Uptodate?

Well, because mdc_read_page() handles a successful page return from
mdc_page_locate() as a hash collision.

I guess I need to understand how lustre maps a hash to a directory
block, and then how it handles collisions...

The reason this jumped out at me is that it looks like it might be
racy.  Adding a page and then removing it might leave a window where
some other thread can find the page.  That is not a problem is a
non-up-to-date page just means we should wait for it.  But if it can
cause an error, then maybe the race is a real problem.

But maybe there is some higher level locking...

NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20181127/5e6433ff/attachment.sig>


More information about the lustre-devel mailing list