[lustre-discuss] systemd lnet/rdma conflict

Christopher Benjamin Coffey Chris.Coffey at nau.edu
Thu Jul 16 11:46:51 PDT 2020


Hi,

I'm trying to get lustre , and rdma setup on an el8 system. I can't get systemd to get the two services: lnet, and rdma shutdown correctly without hanging the system. I've tried many things in the rdma.service, and lnet.service files to get them to work correctly but still the issue exists. Here are my service files below. Anyone know how to fix this? Even with the service files set as below, the system hangs because the Mellanox drivers are attempted to be removed before lnet is stopped first. I get the messages:

- mlx4_core .... mlx4_shutdown was called
- LNetError: 131-3: Received notification of device removal 
- please shutdown LNET to allow this to proceed

---------
[Unit]
Description=lnet management

Requires=network-online.target
After=network-online.target rdma.service
Wants=rdma.service

ConditionPathExists=!/proc/sys/lnet/

[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/sbin/modprobe lnet
ExecStart=/usr/sbin/lnetctl lnet configure
ExecStart=/usr/sbin/lnetctl import /etc/lnet.conf
ExecStop=/usr/sbin/lnetctl lnet unconfigure
ExecStop=/usr/sbin/lustre_rmmod
TimeoutStopSec=30

[Install]
WantedBy=multi-user.target
---------

[Unit]
Description=Initialize the iWARP/InfiniBand/RDMA stack in the kernel
Documentation=file:/etc/rdma/rdma.conf
RefuseManualStop=true
DefaultDependencies=false
Conflicts=emergency.target emergency.service
Before=network.target remote-fs-pre.target lnet.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/libexec/rdma-init-kernel

[Install]
WantedBy=sysinit.target
------

Thanks.

Best,
Chris
 
-- 
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
 
 



More information about the lustre-discuss mailing list