[lustre-devel] [PATCH 3/8] lustre: fld: resend seq lookup RPC if it is on LWP

Andreas Dilger adilger at whamcloud.com
Wed Aug 14 09:58:52 PDT 2019


This is functionality used only by the server (LWP connection and also
MDS-MDS connection flag).  But as I wrote previously, it will be tough to
track this patch to only apply it when the server code is landed. Instead,
it would likely just be a hard-to-find bug that needs to be tracked down
again and fixed again. 

My preference would be to land this and other similar patches in shared
code that is not easily separated into client- and server-only sections. 

Cheers, Andreas

> On Jul 24, 2019, at 19:44, James Simmons <jsimmons at infradead.org> wrote:
> 
> From: wang di <di.wang at intel.com>
> 
> Because Light Weight connection might be evicted after
> restart, then cause inflight RPC fails, to avoid this,
> we need resend seq lookup RPC.
> 
> remove "-f" from "stop mdt" in sanity 17m, so umount can
> keep the the connection, and otherwise the OSP might be
> evicted.
> 
> WC-bug-id: https://jira.whamcloud.com/browse/LU-4571
> Lustre-commit: cf7f66d87e52293535cde6e8cc7386e6c1bdfa46
> Signed-off-by: wang di <di.wang at intel.com>
> Reviewed-on: http://review.whamcloud.com/9106
> Reviewed-by: Andreas Dilger <adilger at whamcloud.com>
> Reviewed-by: Jinshan Xiong <jinshan.xiong at gmail.com>
> Reviewed-by: Niu Yawei <yawei.niu at intel.com>
> ---
> fs/lustre/fld/fld_request.c | 23 +++++++++++++++++++++--
> 1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/lustre/fld/fld_request.c b/fs/lustre/fld/fld_request.c
> index 248fffa..ec45ea6 100644
> --- a/fs/lustre/fld/fld_request.c
> +++ b/fs/lustre/fld/fld_request.c
> @@ -314,6 +314,7 @@ int fld_client_rpc(struct obd_export *exp,
> 
>    LASSERT(exp);
> 
> +again:
>    imp = class_exp2cliimp(exp);
>    switch (fld_op) {
>    case FLD_QUERY:
> @@ -329,8 +330,15 @@ int fld_client_rpc(struct obd_export *exp,
>        op = req_capsule_client_get(&req->rq_pill, &RMF_FLD_OPC);
>        *op = FLD_LOOKUP;
> 
> -        if (imp->imp_connect_flags_orig & OBD_CONNECT_MDS_MDS)
> +        /* For MDS_MDS seq lookup, it will always use LWP connection,
> +         * but LWP will be evicted after restart, so cause the error.
> +         * so we will set no_delay for seq lookup request, once the
> +         * request fails because of the eviction. always retry here
> +         */
> +        if (imp->imp_connect_flags_orig & OBD_CONNECT_MDS_MDS) {
>            req->rq_allow_replay = 1;
> +            req->rq_no_delay = 1;
> +        }
>        break;
>    case FLD_READ:
>        req = ptlrpc_request_alloc_pack(imp, &RQF_FLD_READ,
> @@ -358,8 +366,19 @@ int fld_client_rpc(struct obd_export *exp,
>    obd_get_request_slot(&exp->exp_obd->u.cli);
>    rc = ptlrpc_queue_wait(req);
>    obd_put_request_slot(&exp->exp_obd->u.cli);
> -    if (rc)
> +    if (rc != 0) {
> +        if (rc == -EWOULDBLOCK) {
> +            /* For no_delay req(see above), EWOULDBLOCK means the
> +             * connection is being evicted, but this seq lookup
> +             * should not return error, since it would cause
> +             * unecessary failure of the application, instead
> +             * it should retry here
> +             */
> +            ptlrpc_req_finished(req);
> +            goto again;
> +        }
>        goto out_req;
> +    }
> 
>    if (fld_op == FLD_QUERY) {
>        prange = req_capsule_server_get(&req->rq_pill, &RMF_FLD_MDFLD);
> -- 
> 1.8.3.1
> 


More information about the lustre-devel mailing list