[lustre-devel] [PATCH 05/39] lnet: Correct handling of NETWORK_TIMEOUT status

James Simmons jsimmons at infradead.org
Thu Jan 21 09:16:28 PST 2021


From: Chris Horn <chris.horn at hpe.com>

The original intent of the LNET_MSG_STATUS_NETWORK_TIMEOUT health
status was to handle cases where the LND was unsure whether the
failure was due to the local or remote NI. In this case, we'll want
to decrement both the local and remote NI health and allow recovery
to ascertain which interface is actually healthy.

HPE-bug-id: LUS-9342
WC-bug-id: https://jira.whamcloud.com/browse/LU-13751
Lustre-commit: ffd4523f2d50ef ("LU-13571 lnet: Correct handling of NETWORK_TIMEOUT status")
Signed-off-by: Chris Horn <chris.horn at hpe.com>
Reviewed-on: https://review.whamcloud.com/39898
Reviewed-by: Amir Shehata <ashehata at whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov at whamcloud.com>
Reviewed-by: Oleg Drokin <green at whamcloud.com>
Signed-off-by: James Simmons <jsimmons at infradead.org>
---
 net/lnet/lnet/lib-msg.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c
index e84cf02..d888090 100644
--- a/net/lnet/lnet/lib-msg.c
+++ b/net/lnet/lnet/lib-msg.c
@@ -925,9 +925,14 @@
 
 	case LNET_MSG_STATUS_REMOTE_ERROR:
 	case LNET_MSG_STATUS_REMOTE_TIMEOUT:
+		if (handle_remote_health)
+			lnet_handle_remote_failure(lpni);
+		return -1;
 	case LNET_MSG_STATUS_NETWORK_TIMEOUT:
 		if (handle_remote_health)
 			lnet_handle_remote_failure(lpni);
+		if (handle_local_health)
+			lnet_handle_local_failure(ni);
 		return -1;
 	default:
 		LBUG();
-- 
1.8.3.1



More information about the lustre-devel mailing list