[lustre-discuss] lustre client not able to lctl ping or mount

Pak Lui pak.lui at linaro.org
Tue Sep 4 09:12:03 PDT 2018


Richard, James,

I have tried "map_on_demand=16" to the "/etc/modprobe.d/ko2iblnd.conf" that
was suggested. Also tried "map_on_demand=0" as suggested here:
http://wiki.lustre.org/Optimizing_o2iblnd_Performance

/etc/modprobe.d/ko2iblnd.conf

alias ko2iblnd-opa ko2iblnd
# tried, as suggested in
http://wiki.lustre.org/Optimizing_o2iblnd_Performance
#options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024
ntx=2048 map_on_demand=0 fmr_pool_size=2048 fmr_flush_trigger=512
fmr_cache=1 conns_per_peer=4
options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024
ntx=2048 map_on_demand=16 fmr_pool_size=2048 fmr_flush_trigger=512
fmr_cache=1 conns_per_peer=4
install ko2iblnd /usr/sbin/ko2iblnd-probe


As for the Lustre software versions that I am using:

> server: RHEL 7.5 (3.10.0-862.el7.x86_64), MLNX_OFED_LINUX-4.4-
> 2.0.7.0, lustre 2.11.54
> client: RHEL 7.5 (4.14.0-49.el7a.aarch64), MLNX_OFED_LINUX-4.4-
> 2.0.7.0 , lustre 2.11.54

As for the IB hardware, it is Mellanox ConnectX-5 Socket Direct. Only 1
IPoIB for mlx5_0 (for the ib0 interface) is configured.

Thanks,
- Pak

On Tue, Sep 4, 2018 at 9:00 AM, Richard Henwood <Richard.Henwood at arm.com>
wrote:

> On Tue, 2018-09-04 at 08:06 -0700, Pak Lui wrote:
> > Hi all,
> >
> > I am having issue with the Lustre client pinging the server using
> > o2ib.I want to find out if anyone has a suggestion on what could be
> > the problem. Thanks in advance.
> >
> > lustre client pinging to server:
> > > [root at n0 ~]# lctl ping 192.168.13.8 at o2ib
> > > failed to ping 192.168.13.8 at o2ib: Input/output error <<<<<<<
> >
> > lustre client pinging to server over IPoIB works:
> > > [root at n0~]# ping -c 1 192.168.13.8
> > > PING 192.168.13.8 (192.168.13.8) 56(84) bytes of data.
> > > 64 bytes from 192.168.13.8: icmp_seq=1 ttl=64 time=0.376 ms
> >
> > lustre client pinging to self or other client works:
> > > [root at n0 ~]# lctl ping 192.168.13.54 at o2ib
> > > 12345-0 at lo
> > > 12345-192.168.13.54 at o2ib
> >
> > lustre client pinging to self or otover IPoIB works:
> > > [root at n0~]# ping -c 1 192.168.13.54
> > > PING 192.168.13.54 (192.168.13.54) 56(84) bytes of data.
> > > 64 bytes from 192.168.13.54: icmp_seq=1 ttl=64 time=0.017 ms
> >
> > The lustre server and client have specified the modprobe for lnet:
> > > /etc/modprobe.conf
> > > options lnet networks=o2ib(ib0)
> >
> > The client reports some error when trying to ping or mount from the
> > client to server:
> > modprobe lustre lnet
> > lctl ping 192.168.13.8 at o2ib
> > mount -v -t lustre 192.168.13.8 at o2ib:/zfs /mnt/zfs
> >
> > > [root at n0 ~]# dmesg|tail
> > > [589805.093447] Lustre: Lustre: Build Version: 2.11.54
> > > [589805.272652] LNet: Using FastReg for registration
> > > [589805.275954] LNet: Added LNI 192.168.13.54 at o2ib [8/256/0/180]
> > > [589813.278370] LNet:
> > > 22357:0:(o2iblnd_cb.c:3320:kiblnd_check_conns()) Timed out tx for 1
> > > 92.168.13.186 at o2ib: 589813 seconds
> > > [589835.518404] LustreError:
> > > 22463:0:(mgc_request.c:251:do_config_log_add()) MGC192.168.13.8 at o2i
> > > b: failed processing log, type 1: rc = -5
> > > [589843.118385] LustreError:
> > > 22488:0:(mgc_request.c:601:do_requeue()) failed processing log: -5
> > > [589866.718389] LustreError: 15c-8: MGC192.168.13.8 at o2ib: The
> > > configuration from log 'zfs-client' failed (-5). This may be the
> > > result of communication errors between this node and the MGS, a bad
> > > configuration, or other errors. See the syslog for more
> > > information.
> > > [589866.741623] Lustre: Unmounted zfs-client
> > > [589867.278516] LustreError:
> > > 22463:0:(obd_mount.c:1599:lustre_fill_super()) Unable to mount  (-
> > > 5)
> >
> > server reports some error during mounting:
> > > [root at license ~]# Sep  4 07:26:56 license kernel: LNet:
> > > 25518:0:(o2iblnd_cb.c:2475:kiblnd_passive_connect()) Can't accept
> > > conn from 192.168.13.54 at o2ib (version 12): max_frags 16
> > > incompatible without FMR pool (256 wanted)
> >
> > The lustre server setup:
> > > [root at license ~]# lfs df -h
> > > UUID                       bytes        Used   Available Use%
> > > Mounted on
> > > zfs-MDT0000_UUID          863.4M        7.5M      853.9M   1%
> > > /mnt/zfs[MDT:0]
> > > zfs-OST0000_UUID            1.7T       10.0G        1.7T   1%
> > > /mnt/zfs[OST:0]
> > >
> > > filesystem_summary:         1.7T       10.0G        1.7T   1%
> > > /mnt/zfs
> >
> > server: RHEL 7.5 (3.10.0-862.el7.x86_64), MLNX_OFED_LINUX-4.4-
> > 2.0.7.0, lustre 2.11.54
> > client: RHEL 7.5 (4.14.0-49.el7a.aarch64), MLNX_OFED_LINUX-4.4-
> > 2.0.7.0 , lustre 2.11.54
> >
>
>
> It might be helpful to state the Lustre software versions that you have
> used.
>
> Also, given this is an Arm client with (with presumably 64K pg size),
> connecting to a x86 server (with presumably 4K pg size), have you added
> the map_on_demand=16 incantation to the server? I don't have direct
> experience of this, but heard it was needed in some Arm configurations
> (depending on server/client version):
>
> https://jira.whamcloud.com/browse/LU-10775
>
> May be James can advise?
>
> best regards,
> Richard
>
> --
> Richard.Henwood at arm.com
> Server Software Eco-System
> Tel: +1 512 410 9612
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
>



-- 
Regards,
- Pak
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180904/1a52688d/attachment-0001.html>


More information about the lustre-discuss mailing list