[lustre-discuss] lustre client not able to lctl ping or mount

Klundt, Ruth rklundt at sandia.gov
Tue Sep 4 09:58:28 PDT 2018


FYI, my testing has been with only the map_on_demand=16 setting, and all other modparams default. Also, I haven't run servers on MOFED at all, just kernel IB. And last, my last build was earlier than 2.11.54 so perhaps something new is going on.

ruth


On 9/4/18, 10:12 AM, "lustre-discuss on behalf of lustre-discuss-request at lists.lustre.org" <lustre-discuss-bounces at lists.lustre.org on behalf of lustre-discuss-request at lists.lustre.org> wrote:

    Send lustre-discuss mailing list submissions to
    	lustre-discuss at lists.lustre.org
    
    To subscribe or unsubscribe via the World Wide Web, visit
    	http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
    or, via email, send a message with subject or body 'help' to
    	lustre-discuss-request at lists.lustre.org
    
    You can reach the person managing the list at
    	lustre-discuss-owner at lists.lustre.org
    
    When replying, please edit your Subject line so it is more specific
    than "Re: Contents of lustre-discuss digest..."
    
    
    Today's Topics:
    
       1. lustre client not able to lctl ping or mount (Pak Lui)
       2. Re: lustre client not able to lctl ping or mount (Richard Henwood)
       3. Re: lustre client not able to lctl ping or mount (Pak Lui)
    
    
    ----------------------------------------------------------------------
    
    Message: 1
    Date: Tue, 4 Sep 2018 08:06:09 -0700
    From: Pak Lui <pak.lui at linaro.org>
    To: lustre-discuss at lists.lustre.org
    Subject: [lustre-discuss] lustre client not able to lctl ping or mount
    Message-ID:
    	<CAMScT+X7cxqJETiifWfJ_8LLwenypg=KKb1UnyZXpartvvaR2w at mail.gmail.com>
    Content-Type: text/plain; charset="utf-8"
    
    Hi all,
    
    I am having issue with the Lustre client pinging the server using o2ib.I
    want to find out if anyone has a suggestion on what could be the problem.
    Thanks in advance.
    
    lustre client pinging to server:
    
    [root at n0 ~]# lctl ping 192.168.13.8 at o2ib
    failed to ping 192.168.13.8 at o2ib: Input/output error <<<<<<<
    
    lustre client pinging to server over IPoIB works:
    
    [root at n0~]# ping -c 1 192.168.13.8
    PING 192.168.13.8 (192.168.13.8) 56(84) bytes of data.
    64 bytes from 192.168.13.8: icmp_seq=1 ttl=64 time=0.376 ms
    
    
    lustre client pinging to self or other client works:
    
    [root at n0 ~]# lctl ping 192.168.13.54 at o2ib
    12345-0 at lo
    12345-192.168.13.54 at o2ib
    
    lustre client pinging to self or otover IPoIB works:
    
    [root at n0~]# ping -c 1 192.168.13.54
    PING 192.168.13.54 (192.168.13.54) 56(84) bytes of data.
    64 bytes from 192.168.13.54: icmp_seq=1 ttl=64 time=0.017 ms
    
    
    The lustre server and client have specified the modprobe for lnet:
    
    /etc/modprobe.conf
    options lnet networks=o2ib(ib0)
    
    
    The client reports some error when trying to ping or mount from the client
    to server:
    modprobe lustre lnet
    lctl ping 192.168.13.8 at o2ib
    mount -v -t lustre 192.168.13.8 at o2ib:/zfs /mnt/zfs
    
    [root at n0 ~]# dmesg|tail
    [589805.093447] Lustre: Lustre: Build Version: 2.11.54
    [589805.272652] LNet: Using FastReg for registration
    [589805.275954] LNet: Added LNI 192.168.13.54 at o2ib [8/256/0/180]
    [589813.278370] LNet: 22357:0:(o2iblnd_cb.c:3320:kiblnd_check_conns())
    Timed out tx for 192.168.13.186 at o2ib: 589813 seconds
    [589835.518404] LustreError:
    22463:0:(mgc_request.c:251:do_config_log_add()) MGC192.168.13.8 at o2ib:
    failed processing log, type 1: rc = -5
    [589843.118385] LustreError: 22488:0:(mgc_request.c:601:do_requeue())
    failed processing log: -5
    [589866.718389] LustreError: 15c-8: MGC192.168.13.8 at o2ib: The configuration
    from log 'zfs-client' failed (-5). This may be the result of communication
    errors between this node and the MGS, a bad configuration, or other errors.
    See the syslog for more information.
    [589866.741623] Lustre: Unmounted zfs-client
    [589867.278516] LustreError: 22463:0:(obd_mount.c:1599:lustre_fill_super())
    Unable to mount  (-5)
    
    
    server reports some error during mounting:
    
    [root at license ~]# Sep  4 07:26:56 license kernel: LNet:
    25518:0:(o2iblnd_cb.c:2475:kiblnd_passive_connect()) Can't accept conn from
    192.168.13.54 at o2ib (version 12): max_frags 16 incompatible without FMR pool
    (256 wanted)
    
    
    The lustre server setup:
    
    [root at license ~]# lfs df -h
    UUID                       bytes        Used   Available Use% Mounted on
    zfs-MDT0000_UUID          863.4M        7.5M      853.9M   1%
    /mnt/zfs[MDT:0]
    zfs-OST0000_UUID            1.7T       10.0G        1.7T   1%
    /mnt/zfs[OST:0]
    
    filesystem_summary:         1.7T       10.0G        1.7T   1% /mnt/zfs
    
    
    server: RHEL 7.5 (3.10.0-862.el7.x86_64), MLNX_OFED_LINUX-4.4-2.0.7.0,
    lustre 2.11.54
    client: RHEL 7.5 (4.14.0-49.el7a.aarch64), MLNX_OFED_LINUX-4.4-2.0.7.0 ,
    lustre 2.11.54
    
    Regards,
    - Pak
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180904/08a17f0d/attachment-0001.html>
    
    ------------------------------
    
    Message: 2
    Date: Tue, 4 Sep 2018 16:00:19 +0000
    From: Richard Henwood <Richard.Henwood at arm.com>
    To: "lustre-discuss at lists.lustre.org"
    	<lustre-discuss at lists.lustre.org>, "pak.lui at linaro.org"
    	<pak.lui at linaro.org>
    Subject: Re: [lustre-discuss] lustre client not able to lctl ping or
    	mount
    Message-ID: <5f920989941b1007874e988bf748eb1a84a38068.camel at arm.com>
    Content-Type: text/plain; charset="utf-8"
    
    On Tue, 2018-09-04 at 08:06 -0700, Pak Lui wrote:
    > Hi all,
    >
    > I am having issue with the Lustre client pinging the server using
    > o2ib.I want to find out if anyone has a suggestion on what could be
    > the problem. Thanks in advance.
    >
    > lustre client pinging to server:
    > > [root at n0 ~]# lctl ping 192.168.13.8 at o2ib
    > > failed to ping 192.168.13.8 at o2ib: Input/output error <<<<<<<
    >
    > lustre client pinging to server over IPoIB works:
    > > [root at n0~]# ping -c 1 192.168.13.8
    > > PING 192.168.13.8 (192.168.13.8) 56(84) bytes of data.
    > > 64 bytes from 192.168.13.8: icmp_seq=1 ttl=64 time=0.376 ms
    >
    > lustre client pinging to self or other client works:
    > > [root at n0 ~]# lctl ping 192.168.13.54 at o2ib
    > > 12345-0 at lo
    > > 12345-192.168.13.54 at o2ib
    >
    > lustre client pinging to self or otover IPoIB works:
    > > [root at n0~]# ping -c 1 192.168.13.54
    > > PING 192.168.13.54 (192.168.13.54) 56(84) bytes of data.
    > > 64 bytes from 192.168.13.54: icmp_seq=1 ttl=64 time=0.017 ms
    >
    > The lustre server and client have specified the modprobe for lnet:
    > > /etc/modprobe.conf
    > > options lnet networks=o2ib(ib0)
    >
    > The client reports some error when trying to ping or mount from the
    > client to server:
    > modprobe lustre lnet
    > lctl ping 192.168.13.8 at o2ib
    > mount -v -t lustre 192.168.13.8 at o2ib:/zfs /mnt/zfs
    >
    > > [root at n0 ~]# dmesg|tail
    > > [589805.093447] Lustre: Lustre: Build Version: 2.11.54
    > > [589805.272652] LNet: Using FastReg for registration
    > > [589805.275954] LNet: Added LNI 192.168.13.54 at o2ib [8/256/0/180]
    > > [589813.278370] LNet:
    > > 22357:0:(o2iblnd_cb.c:3320:kiblnd_check_conns()) Timed out tx for 1
    > > 92.168.13.186 at o2ib: 589813 seconds
    > > [589835.518404] LustreError:
    > > 22463:0:(mgc_request.c:251:do_config_log_add()) MGC192.168.13.8 at o2i
    > > b: failed processing log, type 1: rc = -5
    > > [589843.118385] LustreError:
    > > 22488:0:(mgc_request.c:601:do_requeue()) failed processing log: -5
    > > [589866.718389] LustreError: 15c-8: MGC192.168.13.8 at o2ib: The
    > > configuration from log 'zfs-client' failed (-5). This may be the
    > > result of communication errors between this node and the MGS, a bad
    > > configuration, or other errors. See the syslog for more
    > > information.
    > > [589866.741623] Lustre: Unmounted zfs-client
    > > [589867.278516] LustreError:
    > > 22463:0:(obd_mount.c:1599:lustre_fill_super()) Unable to mount  (-
    > > 5)
    >
    > server reports some error during mounting:
    > > [root at license ~]# Sep  4 07:26:56 license kernel: LNet:
    > > 25518:0:(o2iblnd_cb.c:2475:kiblnd_passive_connect()) Can't accept
    > > conn from 192.168.13.54 at o2ib (version 12): max_frags 16
    > > incompatible without FMR pool (256 wanted)
    >
    > The lustre server setup:
    > > [root at license ~]# lfs df -h
    > > UUID                       bytes        Used   Available Use%
    > > Mounted on
    > > zfs-MDT0000_UUID          863.4M        7.5M      853.9M   1%
    > > /mnt/zfs[MDT:0]
    > > zfs-OST0000_UUID            1.7T       10.0G        1.7T   1%
    > > /mnt/zfs[OST:0]
    > >
    > > filesystem_summary:         1.7T       10.0G        1.7T   1%
    > > /mnt/zfs
    >
    > server: RHEL 7.5 (3.10.0-862.el7.x86_64), MLNX_OFED_LINUX-4.4-
    > 2.0.7.0, lustre 2.11.54
    > client: RHEL 7.5 (4.14.0-49.el7a.aarch64), MLNX_OFED_LINUX-4.4-
    > 2.0.7.0 , lustre 2.11.54
    >
    
    
    It might be helpful to state the Lustre software versions that you have
    used.
    
    Also, given this is an Arm client with (with presumably 64K pg size),
    connecting to a x86 server (with presumably 4K pg size), have you added
    the map_on_demand=16 incantation to the server? I don't have direct
    experience of this, but heard it was needed in some Arm configurations
    (depending on server/client version):
    
    https://jira.whamcloud.com/browse/LU-10775
    
    May be James can advise?
    
    best regards,
    Richard
    
    --
    Richard.Henwood at arm.com
    Server Software Eco-System
    Tel: +1 512 410 9612
    IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
    
    ------------------------------
    
    Message: 3
    Date: Tue, 4 Sep 2018 09:12:03 -0700
    From: Pak Lui <pak.lui at linaro.org>
    To: Richard Henwood <Richard.Henwood at arm.com>
    Cc: "lustre-discuss at lists.lustre.org"
    	<lustre-discuss at lists.lustre.org>
    Subject: Re: [lustre-discuss] lustre client not able to lctl ping or
    	mount
    Message-ID:
    	<CAMScT+WpAMcuthcziPOcXkQOukSoWPrL8N928LwRR9f45xMc0w at mail.gmail.com>
    Content-Type: text/plain; charset="utf-8"
    
    Richard, James,
    
    I have tried "map_on_demand=16" to the "/etc/modprobe.d/ko2iblnd.conf" that
    was suggested. Also tried "map_on_demand=0" as suggested here:
    http://wiki.lustre.org/Optimizing_o2iblnd_Performance
    
    /etc/modprobe.d/ko2iblnd.conf
    
    alias ko2iblnd-opa ko2iblnd
    # tried, as suggested in
    http://wiki.lustre.org/Optimizing_o2iblnd_Performance
    #options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024
    ntx=2048 map_on_demand=0 fmr_pool_size=2048 fmr_flush_trigger=512
    fmr_cache=1 conns_per_peer=4
    options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024
    ntx=2048 map_on_demand=16 fmr_pool_size=2048 fmr_flush_trigger=512
    fmr_cache=1 conns_per_peer=4
    install ko2iblnd /usr/sbin/ko2iblnd-probe
    
    
    As for the Lustre software versions that I am using:
    
    > server: RHEL 7.5 (3.10.0-862.el7.x86_64), MLNX_OFED_LINUX-4.4-
    > 2.0.7.0, lustre 2.11.54
    > client: RHEL 7.5 (4.14.0-49.el7a.aarch64), MLNX_OFED_LINUX-4.4-
    > 2.0.7.0 , lustre 2.11.54
    
    As for the IB hardware, it is Mellanox ConnectX-5 Socket Direct. Only 1
    IPoIB for mlx5_0 (for the ib0 interface) is configured.
    
    Thanks,
    - Pak
    
    On Tue, Sep 4, 2018 at 9:00 AM, Richard Henwood <Richard.Henwood at arm.com>
    wrote:
    
    > On Tue, 2018-09-04 at 08:06 -0700, Pak Lui wrote:
    > > Hi all,
    > >
    > > I am having issue with the Lustre client pinging the server using
    > > o2ib.I want to find out if anyone has a suggestion on what could be
    > > the problem. Thanks in advance.
    > >
    > > lustre client pinging to server:
    > > > [root at n0 ~]# lctl ping 192.168.13.8 at o2ib
    > > > failed to ping 192.168.13.8 at o2ib: Input/output error <<<<<<<
    > >
    > > lustre client pinging to server over IPoIB works:
    > > > [root at n0~]# ping -c 1 192.168.13.8
    > > > PING 192.168.13.8 (192.168.13.8) 56(84) bytes of data.
    > > > 64 bytes from 192.168.13.8: icmp_seq=1 ttl=64 time=0.376 ms
    > >
    > > lustre client pinging to self or other client works:
    > > > [root at n0 ~]# lctl ping 192.168.13.54 at o2ib
    > > > 12345-0 at lo
    > > > 12345-192.168.13.54 at o2ib
    > >
    > > lustre client pinging to self or otover IPoIB works:
    > > > [root at n0~]# ping -c 1 192.168.13.54
    > > > PING 192.168.13.54 (192.168.13.54) 56(84) bytes of data.
    > > > 64 bytes from 192.168.13.54: icmp_seq=1 ttl=64 time=0.017 ms
    > >
    > > The lustre server and client have specified the modprobe for lnet:
    > > > /etc/modprobe.conf
    > > > options lnet networks=o2ib(ib0)
    > >
    > > The client reports some error when trying to ping or mount from the
    > > client to server:
    > > modprobe lustre lnet
    > > lctl ping 192.168.13.8 at o2ib
    > > mount -v -t lustre 192.168.13.8 at o2ib:/zfs /mnt/zfs
    > >
    > > > [root at n0 ~]# dmesg|tail
    > > > [589805.093447] Lustre: Lustre: Build Version: 2.11.54
    > > > [589805.272652] LNet: Using FastReg for registration
    > > > [589805.275954] LNet: Added LNI 192.168.13.54 at o2ib [8/256/0/180]
    > > > [589813.278370] LNet:
    > > > 22357:0:(o2iblnd_cb.c:3320:kiblnd_check_conns()) Timed out tx for 1
    > > > 92.168.13.186 at o2ib: 589813 seconds
    > > > [589835.518404] LustreError:
    > > > 22463:0:(mgc_request.c:251:do_config_log_add()) MGC192.168.13.8 at o2i
    > > > b: failed processing log, type 1: rc = -5
    > > > [589843.118385] LustreError:
    > > > 22488:0:(mgc_request.c:601:do_requeue()) failed processing log: -5
    > > > [589866.718389] LustreError: 15c-8: MGC192.168.13.8 at o2ib: The
    > > > configuration from log 'zfs-client' failed (-5). This may be the
    > > > result of communication errors between this node and the MGS, a bad
    > > > configuration, or other errors. See the syslog for more
    > > > information.
    > > > [589866.741623] Lustre: Unmounted zfs-client
    > > > [589867.278516] LustreError:
    > > > 22463:0:(obd_mount.c:1599:lustre_fill_super()) Unable to mount  (-
    > > > 5)
    > >
    > > server reports some error during mounting:
    > > > [root at license ~]# Sep  4 07:26:56 license kernel: LNet:
    > > > 25518:0:(o2iblnd_cb.c:2475:kiblnd_passive_connect()) Can't accept
    > > > conn from 192.168.13.54 at o2ib (version 12): max_frags 16
    > > > incompatible without FMR pool (256 wanted)
    > >
    > > The lustre server setup:
    > > > [root at license ~]# lfs df -h
    > > > UUID                       bytes        Used   Available Use%
    > > > Mounted on
    > > > zfs-MDT0000_UUID          863.4M        7.5M      853.9M   1%
    > > > /mnt/zfs[MDT:0]
    > > > zfs-OST0000_UUID            1.7T       10.0G        1.7T   1%
    > > > /mnt/zfs[OST:0]
    > > >
    > > > filesystem_summary:         1.7T       10.0G        1.7T   1%
    > > > /mnt/zfs
    > >
    > > server: RHEL 7.5 (3.10.0-862.el7.x86_64), MLNX_OFED_LINUX-4.4-
    > > 2.0.7.0, lustre 2.11.54
    > > client: RHEL 7.5 (4.14.0-49.el7a.aarch64), MLNX_OFED_LINUX-4.4-
    > > 2.0.7.0 , lustre 2.11.54
    > >
    >
    >
    > It might be helpful to state the Lustre software versions that you have
    > used.
    >
    > Also, given this is an Arm client with (with presumably 64K pg size),
    > connecting to a x86 server (with presumably 4K pg size), have you added
    > the map_on_demand=16 incantation to the server? I don't have direct
    > experience of this, but heard it was needed in some Arm configurations
    > (depending on server/client version):
    >
    > https://jira.whamcloud.com/browse/LU-10775
    >
    > May be James can advise?
    >
    > best regards,
    > Richard
    >
    > --
    > Richard.Henwood at arm.com
    > Server Software Eco-System
    > Tel: +1 512 410 9612
    > IMPORTANT NOTICE: The contents of this email and any attachments are
    > confidential and may also be privileged. If you are not the intended
    > recipient, please notify the sender immediately and do not disclose the
    > contents to any other person, use it for any purpose, or store or copy the
    > information in any medium. Thank you.
    >
    
    
    
    -- 
    Regards,
    - Pak
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180904/1a52688d/attachment.html>
    
    ------------------------------
    
    Subject: Digest Footer
    
    _______________________________________________
    lustre-discuss mailing list
    lustre-discuss at lists.lustre.org
    http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
    
    
    ------------------------------
    
    End of lustre-discuss Digest, Vol 150, Issue 3
    **********************************************
    



More information about the lustre-discuss mailing list