[lustre-devel] [PATCH 412/622] lnet: Misleading error from lnet_is_health_check
James Simmons
jsimmons at infradead.org
Thu Feb 27 13:14:40 PST 2020
From: Chris Horn <hornc at cray.com>
In the case of sending to 0 at lo we never set msg_txpeer nor
msg_rxpeer. This results in failing this lnet_is_health_check
condition and a misleading error message. The condition is only an
error the msg status is non-zero.
An additional case where we can have msg_rx_committed, but not
msg_rxpeer is for optimized GETs. In this case we allocate a reply
message but do not set msg_rxpeer. We cannot perform further health
checking on this message, but it is not an error condition.
WC-bug-id: https://jira.whamcloud.com/browse/LU-12440
Lustre-commit: 6caa6ed07df0 ("LU-12440 lnet: Misleading error from lnet_is_health_check")
Signed-off-by: Chris Horn <hornc at cray.com>
Reviewed-on: https://review.whamcloud.com/35235
Reviewed-by: Amir Shehata <ashehata at whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825 at cray.com>
Reviewed-by: Oleg Drokin <green at whamcloud.com>
Signed-off-by: James Simmons <jsimmons at infradead.org>
---
net/lnet/lnet/lib-msg.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c
index 9ffd874..b70a6c9 100644
--- a/net/lnet/lnet/lib-msg.c
+++ b/net/lnet/lnet/lib-msg.c
@@ -848,8 +848,13 @@
if ((msg->msg_tx_committed && !msg->msg_txpeer) ||
(msg->msg_rx_committed && !msg->msg_rxpeer)) {
- CDEBUG(D_NET, "msg %p failed too early to retry and send\n",
- msg);
+ /* The optimized GET case does not set msg_rxpeer, but status
+ * could be zero. Only print the error message if we have a
+ * non-zero status.
+ */
+ if (status)
+ CDEBUG(D_NET, "msg %p status %d cannot retry\n", msg,
+ status);
return false;
}
--
1.8.3.1
More information about the lustre-devel
mailing list