[lustre-discuss] systemd lnet/rdma conflict

Christopher Benjamin Coffey Chris.Coffey at nau.edu
Wed Jul 22 16:51:37 PDT 2020


Hi,

For whatever reason, those systemd service files still do not help. To be clear on the setup, this is:

- CentOS 8.2
- MLNX_OFED_LINUX-4.9-0.1.7.0-rhel8.2-x86_64
- lustre-2.12.5-ib , DKMS

I'll keep digging, hope someone has an idea though!

Best,
Chris
 
-- 
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
 
 

On 7/17/20, 11:20 AM, "Mohr Jr, Richard Frank" <rmohr at utk.edu> wrote:



    > On Jul 17, 2020, at 1:41 PM, Andreas Dilger <adilger at dilger.ca> wrote:
    > 
    > 
    > Rick,
    > would you be able to put this in the form of a patch against lustre/scripts/systemd/lnet.service so that this is working
    > well for everyone.  You could use LU-9673 for this.
    > 

    Sure, but I would be interested in getting verification from Chris (or someone else) that it works just to make sure this isn’t something that is only working for me.  

    Rick


    > 
    > 
    >> On Jul 16, 2020, at 2:34 PM, Mohr Jr, Richard Frank <rmohr at utk.edu> wrote:
    >>> On Jul 16, 2020, at 2:46 PM, Christopher Benjamin Coffey <Chris.Coffey at nau.edu> wrote:
    >>> 
    >>> 
    >>> I'm trying to get lustre , and rdma setup on an el8 system. I can't get systemd to get the two services: lnet, and rdma shutdown correctly without hanging the system. I've tried many things in the rdma.service, and lnet.service files to get them to work correctly but still the issue exists. Here are my service files below. Anyone know how to fix this?
    >> 
    >> Yup, ran into the same thing.  See suggestion below.
    >> 
    >>> 
    >>> ---------
    >>> [Unit]
    >>> Description=lnet management
    >>> 
    >>> Requires=network-online.target
    >>> After=network-online.target rdma.service
    >>> Wants=rdma.service
    >>> 
    >>> ConditionPathExists=!/proc/sys/lnet/
    >>> 
    >>> [Service]
    >>> Type=oneshot
    >>> RemainAfterExit=true
    >>> ExecStart=/sbin/modprobe lnet
    >>> ExecStart=/usr/sbin/lnetctl lnet configure
    >>> ExecStart=/usr/sbin/lnetctl import /etc/lnet.conf
    >>> ExecStop=/usr/sbin/lnetctl lnet unconfigure
    >>> ExecStop=/usr/sbin/lustre_rmmod
    >>> TimeoutStopSec=30
    >>> 
    >>> [Install]
    >>> WantedBy=multi-user.target
    >> 
    >> 
    >> Try  adding “BindsTo=rdma.service” to the lnet service file.  This should force the lnet service to be stopped if the rdma service is ever stopped.
    >> 
    >> —
    >> Rick Mohr
    >> Senior HPC System Administrator
    >> Joint Institute for Computational Sciences
    >> University of Tennessee
    >> 
    >> 
    >> 
    >> 
    >> _______________________________________________
    >> lustre-discuss mailing list
    >> lustre-discuss at lists.lustre.org
    >> https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&data=02%7C01%7CChris.Coffey%40nau.edu%7Cbfbc6e3d638440d1d36a08d82a7e1169%7C27d49e9f89e14aa099a3d35b57b2ba03%7C0%7C0%7C637306068232169845&sdata=PLTaXx8cNf0nbXlmWJZkQNrot9zgL24CsK3HGW0rqr8%3D&reserved=0
    > 
    > 
    > Cheers, Andreas
    > 
    > 
    > 
    > 
    > 
    > 
    > 





More information about the lustre-discuss mailing list