[lustre-devel] [PATCH 01/19] lnet: fix delay rule crash
James Simmons
jsimmons at infradead.org
Sun Nov 28 15:27:36 PST 2021
The following crash was captured in testing:
LNetError: 25912:0:(net_fault.c:520:delay_rule_decref()) ASSERTION( list_empty(&rule->dl_sched_link) ) failed:
LNetError: 25912:0:(net_fault.c:520:delay_rule_decref()) LBUG
Pid: 25912, comm: lnet_dd 5.7.0-rc7+ #1 SMP PREEMPT Fri Aug 20 16:17:11 EDT 2021
Call Trace:
libcfs_call_trace+0x62/0x80 [libcfs]
lbug_with_loc+0x41/0xa0 [libcfs]
delay_rule_decref+0x6e/0xe0 [lnet]
lnet_delay_rule_check+0x65/0x110 [lnet]
lnet_delay_rule_daemon+0x76/0x120 [lnet]
The fix is revert the list changes in lnet_delay_rule_check().
Fixes: da4bdd3701 ("lustre: use list_first_entry() in lnet/lnet subdirectory.")
Signed-off-by: James Simmons <jsimmons at infradead.org>
---
net/lnet/lnet/net_fault.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/net/lnet/lnet/net_fault.c b/net/lnet/lnet/net_fault.c
index 06366df..02fc1ae 100644
--- a/net/lnet/lnet/net_fault.c
+++ b/net/lnet/lnet/net_fault.c
@@ -744,15 +744,15 @@ struct delay_daemon_data {
break;
spin_lock_bh(&delay_dd.dd_lock);
- rule = list_first_entry_or_null(&delay_dd.dd_sched_rules,
- struct lnet_delay_rule,
- dl_sched_link);
- if (!rule)
- list_del_init(&rule->dl_sched_link);
- spin_unlock_bh(&delay_dd.dd_lock);
-
- if (!rule)
+ if (list_empty(&delay_dd.dd_sched_rules)) {
+ spin_unlock_bh(&delay_dd.dd_lock);
break;
+ }
+
+ rule = list_entry(delay_dd.dd_sched_rules.next,
+ struct lnet_delay_rule, dl_sched_link);
+ list_del_init(&rule->dl_sched_link);
+ spin_unlock_bh(&delay_dd.dd_lock);
delayed_msg_check(rule, false, &msgs);
delay_rule_decref(rule); /* -1 for delay_dd.dd_sched_rules */
--
1.8.3.1
More information about the lustre-devel
mailing list