[lustre-discuss] systemd lnet/rdma conflict
Christopher Benjamin Coffey
Chris.Coffey at nau.edu
Wed Jul 22 16:51:37 PDT 2020
Hi,
For whatever reason, those systemd service files still do not help. To be clear on the setup, this is:
- CentOS 8.2
- MLNX_OFED_LINUX-4.9-0.1.7.0-rhel8.2-x86_64
- lustre-2.12.5-ib , DKMS
I'll keep digging, hope someone has an idea though!
Best,
Chris
--
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
On 7/17/20, 11:20 AM, "Mohr Jr, Richard Frank" <rmohr at utk.edu> wrote:
> On Jul 17, 2020, at 1:41 PM, Andreas Dilger <adilger at dilger.ca> wrote:
>
>
> Rick,
> would you be able to put this in the form of a patch against lustre/scripts/systemd/lnet.service so that this is working
> well for everyone. You could use LU-9673 for this.
>
Sure, but I would be interested in getting verification from Chris (or someone else) that it works just to make sure this isn’t something that is only working for me.
Rick
>
>
>> On Jul 16, 2020, at 2:34 PM, Mohr Jr, Richard Frank <rmohr at utk.edu> wrote:
>>> On Jul 16, 2020, at 2:46 PM, Christopher Benjamin Coffey <Chris.Coffey at nau.edu> wrote:
>>>
>>>
>>> I'm trying to get lustre , and rdma setup on an el8 system. I can't get systemd to get the two services: lnet, and rdma shutdown correctly without hanging the system. I've tried many things in the rdma.service, and lnet.service files to get them to work correctly but still the issue exists. Here are my service files below. Anyone know how to fix this?
>>
>> Yup, ran into the same thing. See suggestion below.
>>
>>>
>>> ---------
>>> [Unit]
>>> Description=lnet management
>>>
>>> Requires=network-online.target
>>> After=network-online.target rdma.service
>>> Wants=rdma.service
>>>
>>> ConditionPathExists=!/proc/sys/lnet/
>>>
>>> [Service]
>>> Type=oneshot
>>> RemainAfterExit=true
>>> ExecStart=/sbin/modprobe lnet
>>> ExecStart=/usr/sbin/lnetctl lnet configure
>>> ExecStart=/usr/sbin/lnetctl import /etc/lnet.conf
>>> ExecStop=/usr/sbin/lnetctl lnet unconfigure
>>> ExecStop=/usr/sbin/lustre_rmmod
>>> TimeoutStopSec=30
>>>
>>> [Install]
>>> WantedBy=multi-user.target
>>
>>
>> Try adding “BindsTo=rdma.service” to the lnet service file. This should force the lnet service to be stopped if the rdma service is ever stopped.
>>
>> —
>> Rick Mohr
>> Senior HPC System Administrator
>> Joint Institute for Computational Sciences
>> University of Tennessee
>>
>>
>>
>>
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&data=02%7C01%7CChris.Coffey%40nau.edu%7Cbfbc6e3d638440d1d36a08d82a7e1169%7C27d49e9f89e14aa099a3d35b57b2ba03%7C0%7C0%7C637306068232169845&sdata=PLTaXx8cNf0nbXlmWJZkQNrot9zgL24CsK3HGW0rqr8%3D&reserved=0
>
>
> Cheers, Andreas
>
>
>
>
>
>
>
More information about the lustre-discuss
mailing list