[lustre-discuss] Can't join an IBM Power9 System AC922 to an existing Lustre Service (Intel Servers) through Mellanox Infiniband

Americo Ojeda americo.ojeda at sinergiasys.com
Tue Oct 8 08:04:58 PDT 2019


Hi, I would like to know if the lustre client software is compatible
with the ppc64le architecture and Mellanox Infiniband? I think is a
problem between lustre and infiniband.

I want to join a node IBM Power System Power9 - AC922 to an existing
lustre server (Intel servers), I built the lustre cliente software from
source and installed succesfully, but I cant join this node to an
existing lustre service.

Server Node (client)

    IBM Power System 9 - AC922
    Red Hat Enterprise Linux Server release 7.5 (Alternate)
    Linux SinergiAC922 4.14.0-49.13.1.el7a.ppc64le #1 SMP Mon Aug 27
07:37:11 EDT 2018 ppc64le ppc64le ppc64le GNU/Linux
    Mellanox Driver Version: 4.5-1.0.1
    Lustre Client 2.12.58
    Compilation: ./configure --disable-server --disable-tests
--with-o2ib=/usr/src/ofa_kernel/default

dmesg log:

[163444.797346] Lustre: Lustre: Build Version: 2.12.58_145_gfcf219d
[163445.007000] LNet: Using FastReg for registration
[163445.008017] LNet: Added LNI my_ip_address at o2ib [8/256/0/180]

[163460.523709] LNetError:
17267:0:(peer.c:3724:lnet_peer_ni_add_to_recoveryq_locked()) lpni
lustre_server_address at o2ib added to recovery queue. Health = 900
[163460.523775] LNetError:
17267:0:(lib-msg.c:481:lnet_handle_local_failure()) ni
my_ip_address at o2ib added to recovery queue. Health = 900

messages log:

Sep 26 11:37:02 AC922 kernel: LNetError:
1404:0:(peer.c:3713:lnet_peer_ni_add_to_recoveryq_locked()) lpni
lustre_server_address at o2ib added to recovery queue. Health = 900
Sep 26 11:37:02 SinergiAC922 kernel: LNetError:
1404:0:(lib-msg.c:481:lnet_handle_local_failure()) ni my_ip_address at o2ib
added to recovery queue. Health = 900
Sep 26 11:37:08 AC922 kernel: LustreError:
73939:0:(mgc_request.c:250:do_config_log_add())
MGClustre_server_address at o2ib: failed processing log, type 1: rc = -5
Sep 26 11:37:16 AC922 kernel: LustreError:
73949:0:(mgc_request.c:598:do_requeue()) failed processing log: -5
Sep 26 11:37:39 AC922 kernel: LustreError: 15c-8:
MGClustre_server_address at o2ib: Confguration from log testfs-client
failed from MGS -5. Communication error between node & MGS, a bad
configuration, or other errors. See syslog for more info
Sep 26 11:37:39 AC922 kernel: Lustre: Unmounted testfs-client
Sep 26 11:37:39 AC922 kernel: LustreError:
73939:0:(obd_mount.c:1669:lustre_fill_super()) Unable to mount  (-5)

-- 
Americo Ojeda<http://www.sinergiasys.com/aviso-de-privacidad/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pEpkey.asc
Type: application/pgp-keys
Size: 1809 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20191008/9dcafc13/attachment.key>


More information about the lustre-discuss mailing list