[lustre-devel] [PATCH 380/622] lnet: use after free in lnet_discover_peer_locked()
James Simmons
jsimmons at infradead.org
Thu Feb 27 13:14:08 PST 2020
From: Olaf Weber <olaf.weber at hpe.com>
When the lnet_net_lock is unlocked, the peer attached to an
lnet_peer_ni (found via lnet_peer_ni::lpni_peer_net->lpn_peer)
can change, and the old peer deallocated. If we are really
unlucky, then all the churn could give us a new, different,
peer at the same address in memory.
Change the reference counting on the lnet_peer lp so that it
is guaranteed to be alive when we relock the lnet_net_lock for
the cpt. When the reference count is dropped lp may go away if
it was unlinked, but the new peer is guaranteed to have a
different address, so we can still correctly determine whether
the peer changed and discovery should be redone.
WC-bug-id: https://jira.whamcloud.com/browse/LU-9971
Lustre-commit: 2b5b551b15d9 ("LU-9971 lnet: use after free in lnet_discover_peer_locked()")
Signed-off-by: Olaf Weber <olaf.weber at hpe.com>
Reviewed-on: https://review.whamcloud.com/28944
Reviewed-by: Amir Shehata <ashehata at whamcloud.com>
Reviewed-by: James Simmons <uja.ornl at yahoo.com>
Reviewed-by: Oleg Drokin <green at whamcloud.com>
Signed-off-by: James Simmons <jsimmons at infradead.org>
---
net/lnet/lnet/peer.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index e5cce2f..d167a37 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -2150,6 +2150,8 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp)
* zombie if we race with DLC, so we must check for that.
*/
for (;;) {
+ /* Keep lp alive when the lnet_net_lock is unlocked */
+ lnet_peer_addref_locked(lp);
prepare_to_wait(&lp->lp_dc_waitq, &wait, TASK_INTERRUPTIBLE);
if (signal_pending(current))
break;
@@ -2161,16 +2163,14 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp)
break;
lnet_peer_queue_for_discovery(lp);
- /*
- * if caller requested a non-blocking operation then
- * return immediately. Once discovery is complete then the
- * peer ref will be decremented and any pending messages
- * that were stopped due to discovery will be transmitted.
+ /* If caller requested a non-blocking operation then
+ * return immediately. Once discovery is complete any
+ * pending messages that were stopped due to discovery
+ * will be transmitted.
*/
if (!block)
break;
- lnet_peer_addref_locked(lp);
lnet_net_unlock(LNET_LOCK_EX);
schedule();
finish_wait(&lp->lp_dc_waitq, &wait);
@@ -2192,10 +2192,12 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp)
lnet_net_unlock(LNET_LOCK_EX);
lnet_net_lock(cpt);
-
- /* If the peer has changed after we've discovered the older peer,
- * then we need to discovery the new peer to make sure the
- * interface information is up to date
+ lnet_peer_decref_locked(lp);
+ /* The peer may have changed, so re-check and rediscover if that turns
+ * out to have been the case. The reference count on lp ensured that
+ * even if it was unlinked from lpni the memory could not be recycled.
+ * Thus the check below is sufficient to determine whether the peer
+ * changed. If the peer changed, then lp must not be dereferenced.
*/
if (lp != lpni->lpni_peer_net->lpn_peer)
goto again;
--
1.8.3.1
More information about the lustre-devel
mailing list