[lustre-devel] [PATCH 255/622] lustre: ptlrpc: IR doesn't reconnect after EAGAIN
James Simmons
jsimmons at infradead.org
Thu Feb 27 13:12:03 PST 2020
From: Sergey Cheremencev <c17829 at cray.com>
There is a chance that client is connecting to OST
before recovery start when OST is not configured.
In such case OST returns EAGAIN(target->obd_no_conn == 1).
There is no problem when pinger_recov is enabled
because ptlrpc_pinger_main will reconnect later.
But it doesn't reconnect when pinger_recov is 0.
Move setting imp_connect_error to ptlrpc_connect_interpret.
It is needed to store there only connection errors.
Cray-bug-id: LUS-2034
WC-bug-id: https://jira.whamcloud.com/browse/LU-11601
Lustre-commit: 3341c8c31871 ("LU-11601 ptlrpc: IR doesn't reconnect after EAGAIN")
Signed-off-by: Sergey Cheremencev <c17829 at cray.com>
Reviewed-on: https://es-gerrit.dev.cray.com/153542
Reviewed-by: Alexey Lyashkov <c17817 at cray.com>
Reviewed-by: Vitaly Fertman <c17818 at cray.com>
Reviewed-on: https://review.whamcloud.com/33557
Reviewed-by: Andreas Dilger <adilger at whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825 at cray.com>
Reviewed-by: Oleg Drokin <green at whamcloud.com>
Signed-off-by: James Simmons <jsimmons at infradead.org>
---
fs/lustre/include/obd_support.h | 1 +
fs/lustre/ptlrpc/client.c | 1 -
fs/lustre/ptlrpc/import.c | 1 +
fs/lustre/ptlrpc/pinger.c | 3 ++-
4 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h
index 36955e8..9ebdcb6 100644
--- a/fs/lustre/include/obd_support.h
+++ b/fs/lustre/include/obd_support.h
@@ -264,6 +264,7 @@
#define OBD_FAIL_OST_STATFS_EINPROGRESS 0x231
#define OBD_FAIL_OST_SET_INFO_NET 0x232
#define OBD_FAIL_OST_DISCONNECT_DELAY 0x245
+#define OBD_FAIL_OST_PREPARE_DELAY 0x247
#define OBD_FAIL_LDLM 0x300
#define OBD_FAIL_LDLM_NAMESPACE_NEW 0x301
diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c
index f57ec1883..0f5aa92 100644
--- a/fs/lustre/ptlrpc/client.c
+++ b/fs/lustre/ptlrpc/client.c
@@ -1457,7 +1457,6 @@ static int after_reply(struct ptlrpc_request *req)
lustre_msg_get_service_time(req->rq_repmsg));
rc = ptlrpc_check_status(req);
- imp->imp_connect_error = rc;
if (rc) {
/*
diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c
index 39d9e3e..a75856a 100644
--- a/fs/lustre/ptlrpc/import.c
+++ b/fs/lustre/ptlrpc/import.c
@@ -944,6 +944,7 @@ static int ptlrpc_connect_interpret(const struct lu_env *env,
return 0;
}
+ imp->imp_connect_error = rc;
if (rc) {
struct ptlrpc_request *free_req;
struct ptlrpc_request *tmp;
diff --git a/fs/lustre/ptlrpc/pinger.c b/fs/lustre/ptlrpc/pinger.c
index c565e2d..c3fbddc 100644
--- a/fs/lustre/ptlrpc/pinger.c
+++ b/fs/lustre/ptlrpc/pinger.c
@@ -228,7 +228,8 @@ static void ptlrpc_pinger_process_import(struct obd_import *imp,
if (level == LUSTRE_IMP_DISCON && !imp_is_deactive(imp)) {
/* wait for a while before trying recovery again */
imp->imp_next_ping = ptlrpc_next_reconnect(imp);
- if (!imp->imp_no_pinger_recover)
+ if (!imp->imp_no_pinger_recover ||
+ imp->imp_connect_error == -EAGAIN)
ptlrpc_initiate_recovery(imp);
} else if (level != LUSTRE_IMP_FULL ||
imp->imp_obd->obd_no_recov ||
--
1.8.3.1
More information about the lustre-devel
mailing list