[lustre-discuss] lustre client not able to lctl ping or mount
Klundt, Ruth
rklundt at sandia.gov
Tue Sep 4 09:58:28 PDT 2018
FYI, my testing has been with only the map_on_demand=16 setting, and all other modparams default. Also, I haven't run servers on MOFED at all, just kernel IB. And last, my last build was earlier than 2.11.54 so perhaps something new is going on.
ruth
On 9/4/18, 10:12 AM, "lustre-discuss on behalf of lustre-discuss-request at lists.lustre.org" <lustre-discuss-bounces at lists.lustre.org on behalf of lustre-discuss-request at lists.lustre.org> wrote:
Send lustre-discuss mailing list submissions to
lustre-discuss at lists.lustre.org
To subscribe or unsubscribe via the World Wide Web, visit
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
or, via email, send a message with subject or body 'help' to
lustre-discuss-request at lists.lustre.org
You can reach the person managing the list at
lustre-discuss-owner at lists.lustre.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of lustre-discuss digest..."
Today's Topics:
1. lustre client not able to lctl ping or mount (Pak Lui)
2. Re: lustre client not able to lctl ping or mount (Richard Henwood)
3. Re: lustre client not able to lctl ping or mount (Pak Lui)
----------------------------------------------------------------------
Message: 1
Date: Tue, 4 Sep 2018 08:06:09 -0700
From: Pak Lui <pak.lui at linaro.org>
To: lustre-discuss at lists.lustre.org
Subject: [lustre-discuss] lustre client not able to lctl ping or mount
Message-ID:
<CAMScT+X7cxqJETiifWfJ_8LLwenypg=KKb1UnyZXpartvvaR2w at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi all,
I am having issue with the Lustre client pinging the server using o2ib.I
want to find out if anyone has a suggestion on what could be the problem.
Thanks in advance.
lustre client pinging to server:
[root at n0 ~]# lctl ping 192.168.13.8 at o2ib
failed to ping 192.168.13.8 at o2ib: Input/output error <<<<<<<
lustre client pinging to server over IPoIB works:
[root at n0~]# ping -c 1 192.168.13.8
PING 192.168.13.8 (192.168.13.8) 56(84) bytes of data.
64 bytes from 192.168.13.8: icmp_seq=1 ttl=64 time=0.376 ms
lustre client pinging to self or other client works:
[root at n0 ~]# lctl ping 192.168.13.54 at o2ib
12345-0 at lo
12345-192.168.13.54 at o2ib
lustre client pinging to self or otover IPoIB works:
[root at n0~]# ping -c 1 192.168.13.54
PING 192.168.13.54 (192.168.13.54) 56(84) bytes of data.
64 bytes from 192.168.13.54: icmp_seq=1 ttl=64 time=0.017 ms
The lustre server and client have specified the modprobe for lnet:
/etc/modprobe.conf
options lnet networks=o2ib(ib0)
The client reports some error when trying to ping or mount from the client
to server:
modprobe lustre lnet
lctl ping 192.168.13.8 at o2ib
mount -v -t lustre 192.168.13.8 at o2ib:/zfs /mnt/zfs
[root at n0 ~]# dmesg|tail
[589805.093447] Lustre: Lustre: Build Version: 2.11.54
[589805.272652] LNet: Using FastReg for registration
[589805.275954] LNet: Added LNI 192.168.13.54 at o2ib [8/256/0/180]
[589813.278370] LNet: 22357:0:(o2iblnd_cb.c:3320:kiblnd_check_conns())
Timed out tx for 192.168.13.186 at o2ib: 589813 seconds
[589835.518404] LustreError:
22463:0:(mgc_request.c:251:do_config_log_add()) MGC192.168.13.8 at o2ib:
failed processing log, type 1: rc = -5
[589843.118385] LustreError: 22488:0:(mgc_request.c:601:do_requeue())
failed processing log: -5
[589866.718389] LustreError: 15c-8: MGC192.168.13.8 at o2ib: The configuration
from log 'zfs-client' failed (-5). This may be the result of communication
errors between this node and the MGS, a bad configuration, or other errors.
See the syslog for more information.
[589866.741623] Lustre: Unmounted zfs-client
[589867.278516] LustreError: 22463:0:(obd_mount.c:1599:lustre_fill_super())
Unable to mount (-5)
server reports some error during mounting:
[root at license ~]# Sep 4 07:26:56 license kernel: LNet:
25518:0:(o2iblnd_cb.c:2475:kiblnd_passive_connect()) Can't accept conn from
192.168.13.54 at o2ib (version 12): max_frags 16 incompatible without FMR pool
(256 wanted)
The lustre server setup:
[root at license ~]# lfs df -h
UUID bytes Used Available Use% Mounted on
zfs-MDT0000_UUID 863.4M 7.5M 853.9M 1%
/mnt/zfs[MDT:0]
zfs-OST0000_UUID 1.7T 10.0G 1.7T 1%
/mnt/zfs[OST:0]
filesystem_summary: 1.7T 10.0G 1.7T 1% /mnt/zfs
server: RHEL 7.5 (3.10.0-862.el7.x86_64), MLNX_OFED_LINUX-4.4-2.0.7.0,
lustre 2.11.54
client: RHEL 7.5 (4.14.0-49.el7a.aarch64), MLNX_OFED_LINUX-4.4-2.0.7.0 ,
lustre 2.11.54
Regards,
- Pak
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180904/08a17f0d/attachment-0001.html>
------------------------------
Message: 2
Date: Tue, 4 Sep 2018 16:00:19 +0000
From: Richard Henwood <Richard.Henwood at arm.com>
To: "lustre-discuss at lists.lustre.org"
<lustre-discuss at lists.lustre.org>, "pak.lui at linaro.org"
<pak.lui at linaro.org>
Subject: Re: [lustre-discuss] lustre client not able to lctl ping or
mount
Message-ID: <5f920989941b1007874e988bf748eb1a84a38068.camel at arm.com>
Content-Type: text/plain; charset="utf-8"
On Tue, 2018-09-04 at 08:06 -0700, Pak Lui wrote:
> Hi all,
>
> I am having issue with the Lustre client pinging the server using
> o2ib.I want to find out if anyone has a suggestion on what could be
> the problem. Thanks in advance.
>
> lustre client pinging to server:
> > [root at n0 ~]# lctl ping 192.168.13.8 at o2ib
> > failed to ping 192.168.13.8 at o2ib: Input/output error <<<<<<<
>
> lustre client pinging to server over IPoIB works:
> > [root at n0~]# ping -c 1 192.168.13.8
> > PING 192.168.13.8 (192.168.13.8) 56(84) bytes of data.
> > 64 bytes from 192.168.13.8: icmp_seq=1 ttl=64 time=0.376 ms
>
> lustre client pinging to self or other client works:
> > [root at n0 ~]# lctl ping 192.168.13.54 at o2ib
> > 12345-0 at lo
> > 12345-192.168.13.54 at o2ib
>
> lustre client pinging to self or otover IPoIB works:
> > [root at n0~]# ping -c 1 192.168.13.54
> > PING 192.168.13.54 (192.168.13.54) 56(84) bytes of data.
> > 64 bytes from 192.168.13.54: icmp_seq=1 ttl=64 time=0.017 ms
>
> The lustre server and client have specified the modprobe for lnet:
> > /etc/modprobe.conf
> > options lnet networks=o2ib(ib0)
>
> The client reports some error when trying to ping or mount from the
> client to server:
> modprobe lustre lnet
> lctl ping 192.168.13.8 at o2ib
> mount -v -t lustre 192.168.13.8 at o2ib:/zfs /mnt/zfs
>
> > [root at n0 ~]# dmesg|tail
> > [589805.093447] Lustre: Lustre: Build Version: 2.11.54
> > [589805.272652] LNet: Using FastReg for registration
> > [589805.275954] LNet: Added LNI 192.168.13.54 at o2ib [8/256/0/180]
> > [589813.278370] LNet:
> > 22357:0:(o2iblnd_cb.c:3320:kiblnd_check_conns()) Timed out tx for 1
> > 92.168.13.186 at o2ib: 589813 seconds
> > [589835.518404] LustreError:
> > 22463:0:(mgc_request.c:251:do_config_log_add()) MGC192.168.13.8 at o2i
> > b: failed processing log, type 1: rc = -5
> > [589843.118385] LustreError:
> > 22488:0:(mgc_request.c:601:do_requeue()) failed processing log: -5
> > [589866.718389] LustreError: 15c-8: MGC192.168.13.8 at o2ib: The
> > configuration from log 'zfs-client' failed (-5). This may be the
> > result of communication errors between this node and the MGS, a bad
> > configuration, or other errors. See the syslog for more
> > information.
> > [589866.741623] Lustre: Unmounted zfs-client
> > [589867.278516] LustreError:
> > 22463:0:(obd_mount.c:1599:lustre_fill_super()) Unable to mount (-
> > 5)
>
> server reports some error during mounting:
> > [root at license ~]# Sep 4 07:26:56 license kernel: LNet:
> > 25518:0:(o2iblnd_cb.c:2475:kiblnd_passive_connect()) Can't accept
> > conn from 192.168.13.54 at o2ib (version 12): max_frags 16
> > incompatible without FMR pool (256 wanted)
>
> The lustre server setup:
> > [root at license ~]# lfs df -h
> > UUID bytes Used Available Use%
> > Mounted on
> > zfs-MDT0000_UUID 863.4M 7.5M 853.9M 1%
> > /mnt/zfs[MDT:0]
> > zfs-OST0000_UUID 1.7T 10.0G 1.7T 1%
> > /mnt/zfs[OST:0]
> >
> > filesystem_summary: 1.7T 10.0G 1.7T 1%
> > /mnt/zfs
>
> server: RHEL 7.5 (3.10.0-862.el7.x86_64), MLNX_OFED_LINUX-4.4-
> 2.0.7.0, lustre 2.11.54
> client: RHEL 7.5 (4.14.0-49.el7a.aarch64), MLNX_OFED_LINUX-4.4-
> 2.0.7.0 , lustre 2.11.54
>
It might be helpful to state the Lustre software versions that you have
used.
Also, given this is an Arm client with (with presumably 64K pg size),
connecting to a x86 server (with presumably 4K pg size), have you added
the map_on_demand=16 incantation to the server? I don't have direct
experience of this, but heard it was needed in some Arm configurations
(depending on server/client version):
https://jira.whamcloud.com/browse/LU-10775
May be James can advise?
best regards,
Richard
--
Richard.Henwood at arm.com
Server Software Eco-System
Tel: +1 512 410 9612
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
------------------------------
Message: 3
Date: Tue, 4 Sep 2018 09:12:03 -0700
From: Pak Lui <pak.lui at linaro.org>
To: Richard Henwood <Richard.Henwood at arm.com>
Cc: "lustre-discuss at lists.lustre.org"
<lustre-discuss at lists.lustre.org>
Subject: Re: [lustre-discuss] lustre client not able to lctl ping or
mount
Message-ID:
<CAMScT+WpAMcuthcziPOcXkQOukSoWPrL8N928LwRR9f45xMc0w at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Richard, James,
I have tried "map_on_demand=16" to the "/etc/modprobe.d/ko2iblnd.conf" that
was suggested. Also tried "map_on_demand=0" as suggested here:
http://wiki.lustre.org/Optimizing_o2iblnd_Performance
/etc/modprobe.d/ko2iblnd.conf
alias ko2iblnd-opa ko2iblnd
# tried, as suggested in
http://wiki.lustre.org/Optimizing_o2iblnd_Performance
#options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024
ntx=2048 map_on_demand=0 fmr_pool_size=2048 fmr_flush_trigger=512
fmr_cache=1 conns_per_peer=4
options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024
ntx=2048 map_on_demand=16 fmr_pool_size=2048 fmr_flush_trigger=512
fmr_cache=1 conns_per_peer=4
install ko2iblnd /usr/sbin/ko2iblnd-probe
As for the Lustre software versions that I am using:
> server: RHEL 7.5 (3.10.0-862.el7.x86_64), MLNX_OFED_LINUX-4.4-
> 2.0.7.0, lustre 2.11.54
> client: RHEL 7.5 (4.14.0-49.el7a.aarch64), MLNX_OFED_LINUX-4.4-
> 2.0.7.0 , lustre 2.11.54
As for the IB hardware, it is Mellanox ConnectX-5 Socket Direct. Only 1
IPoIB for mlx5_0 (for the ib0 interface) is configured.
Thanks,
- Pak
On Tue, Sep 4, 2018 at 9:00 AM, Richard Henwood <Richard.Henwood at arm.com>
wrote:
> On Tue, 2018-09-04 at 08:06 -0700, Pak Lui wrote:
> > Hi all,
> >
> > I am having issue with the Lustre client pinging the server using
> > o2ib.I want to find out if anyone has a suggestion on what could be
> > the problem. Thanks in advance.
> >
> > lustre client pinging to server:
> > > [root at n0 ~]# lctl ping 192.168.13.8 at o2ib
> > > failed to ping 192.168.13.8 at o2ib: Input/output error <<<<<<<
> >
> > lustre client pinging to server over IPoIB works:
> > > [root at n0~]# ping -c 1 192.168.13.8
> > > PING 192.168.13.8 (192.168.13.8) 56(84) bytes of data.
> > > 64 bytes from 192.168.13.8: icmp_seq=1 ttl=64 time=0.376 ms
> >
> > lustre client pinging to self or other client works:
> > > [root at n0 ~]# lctl ping 192.168.13.54 at o2ib
> > > 12345-0 at lo
> > > 12345-192.168.13.54 at o2ib
> >
> > lustre client pinging to self or otover IPoIB works:
> > > [root at n0~]# ping -c 1 192.168.13.54
> > > PING 192.168.13.54 (192.168.13.54) 56(84) bytes of data.
> > > 64 bytes from 192.168.13.54: icmp_seq=1 ttl=64 time=0.017 ms
> >
> > The lustre server and client have specified the modprobe for lnet:
> > > /etc/modprobe.conf
> > > options lnet networks=o2ib(ib0)
> >
> > The client reports some error when trying to ping or mount from the
> > client to server:
> > modprobe lustre lnet
> > lctl ping 192.168.13.8 at o2ib
> > mount -v -t lustre 192.168.13.8 at o2ib:/zfs /mnt/zfs
> >
> > > [root at n0 ~]# dmesg|tail
> > > [589805.093447] Lustre: Lustre: Build Version: 2.11.54
> > > [589805.272652] LNet: Using FastReg for registration
> > > [589805.275954] LNet: Added LNI 192.168.13.54 at o2ib [8/256/0/180]
> > > [589813.278370] LNet:
> > > 22357:0:(o2iblnd_cb.c:3320:kiblnd_check_conns()) Timed out tx for 1
> > > 92.168.13.186 at o2ib: 589813 seconds
> > > [589835.518404] LustreError:
> > > 22463:0:(mgc_request.c:251:do_config_log_add()) MGC192.168.13.8 at o2i
> > > b: failed processing log, type 1: rc = -5
> > > [589843.118385] LustreError:
> > > 22488:0:(mgc_request.c:601:do_requeue()) failed processing log: -5
> > > [589866.718389] LustreError: 15c-8: MGC192.168.13.8 at o2ib: The
> > > configuration from log 'zfs-client' failed (-5). This may be the
> > > result of communication errors between this node and the MGS, a bad
> > > configuration, or other errors. See the syslog for more
> > > information.
> > > [589866.741623] Lustre: Unmounted zfs-client
> > > [589867.278516] LustreError:
> > > 22463:0:(obd_mount.c:1599:lustre_fill_super()) Unable to mount (-
> > > 5)
> >
> > server reports some error during mounting:
> > > [root at license ~]# Sep 4 07:26:56 license kernel: LNet:
> > > 25518:0:(o2iblnd_cb.c:2475:kiblnd_passive_connect()) Can't accept
> > > conn from 192.168.13.54 at o2ib (version 12): max_frags 16
> > > incompatible without FMR pool (256 wanted)
> >
> > The lustre server setup:
> > > [root at license ~]# lfs df -h
> > > UUID bytes Used Available Use%
> > > Mounted on
> > > zfs-MDT0000_UUID 863.4M 7.5M 853.9M 1%
> > > /mnt/zfs[MDT:0]
> > > zfs-OST0000_UUID 1.7T 10.0G 1.7T 1%
> > > /mnt/zfs[OST:0]
> > >
> > > filesystem_summary: 1.7T 10.0G 1.7T 1%
> > > /mnt/zfs
> >
> > server: RHEL 7.5 (3.10.0-862.el7.x86_64), MLNX_OFED_LINUX-4.4-
> > 2.0.7.0, lustre 2.11.54
> > client: RHEL 7.5 (4.14.0-49.el7a.aarch64), MLNX_OFED_LINUX-4.4-
> > 2.0.7.0 , lustre 2.11.54
> >
>
>
> It might be helpful to state the Lustre software versions that you have
> used.
>
> Also, given this is an Arm client with (with presumably 64K pg size),
> connecting to a x86 server (with presumably 4K pg size), have you added
> the map_on_demand=16 incantation to the server? I don't have direct
> experience of this, but heard it was needed in some Arm configurations
> (depending on server/client version):
>
> https://jira.whamcloud.com/browse/LU-10775
>
> May be James can advise?
>
> best regards,
> Richard
>
> --
> Richard.Henwood at arm.com
> Server Software Eco-System
> Tel: +1 512 410 9612
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
>
--
Regards,
- Pak
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180904/1a52688d/attachment.html>
------------------------------
Subject: Digest Footer
_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
------------------------------
End of lustre-discuss Digest, Vol 150, Issue 3
**********************************************
More information about the lustre-discuss
mailing list