[lustre-discuss] Can't join an IBM Power9 System AC922 to an existing Lustre Service (Intel Servers) through Mellanox Infiniband

Americo Ojeda americo.ojeda at sinergiasys.com
Tue Oct 8 15:40:45 PDT 2019


I did some communications test with tcpdump, if a use "telnet server_ip
988" a get some response from server, but when use "hctl ping
server_ip at o2ib" I get nothing from tcpdump. I think the problem is with
my infiniband communication.

some recomentation?

On 10/8/19 10:04 AM, Americo Ojeda wrote:
> Hi, I would like to know if the lustre client software is compatible
> with the ppc64le architecture and Mellanox Infiniband? I think is a
> problem between lustre and infiniband.
>
> I want to join a node IBM Power System Power9 - AC922 to an existing
> lustre server (Intel servers), I built the lustre cliente software from
> source and installed succesfully, but I cant join this node to an
> existing lustre service.
>
> Server Node (client)
>
>     IBM Power System 9 - AC922
>     Red Hat Enterprise Linux Server release 7.5 (Alternate)
>     Linux SinergiAC922 4.14.0-49.13.1.el7a.ppc64le #1 SMP Mon Aug 27
> 07:37:11 EDT 2018 ppc64le ppc64le ppc64le GNU/Linux
>     Mellanox Driver Version: 4.5-1.0.1
>     Lustre Client 2.12.58
>     Compilation: ./configure --disable-server --disable-tests
> --with-o2ib=/usr/src/ofa_kernel/default
>
> dmesg log:
>
> [163444.797346] Lustre: Lustre: Build Version: 2.12.58_145_gfcf219d
> [163445.007000] LNet: Using FastReg for registration
> [163445.008017] LNet: Added LNI my_ip_address at o2ib [8/256/0/180]
>
> [163460.523709] LNetError:
> 17267:0:(peer.c:3724:lnet_peer_ni_add_to_recoveryq_locked()) lpni
> lustre_server_address at o2ib added to recovery queue. Health = 900
> [163460.523775] LNetError:
> 17267:0:(lib-msg.c:481:lnet_handle_local_failure()) ni
> my_ip_address at o2ib added to recovery queue. Health = 900
>
> messages log:
>
> Sep 26 11:37:02 AC922 kernel: LNetError:
> 1404:0:(peer.c:3713:lnet_peer_ni_add_to_recoveryq_locked()) lpni
> lustre_server_address at o2ib added to recovery queue. Health = 900
> Sep 26 11:37:02 SinergiAC922 kernel: LNetError:
> 1404:0:(lib-msg.c:481:lnet_handle_local_failure()) ni my_ip_address at o2ib
> added to recovery queue. Health = 900
> Sep 26 11:37:08 AC922 kernel: LustreError:
> 73939:0:(mgc_request.c:250:do_config_log_add())
> MGClustre_server_address at o2ib: failed processing log, type 1: rc = -5
> Sep 26 11:37:16 AC922 kernel: LustreError:
> 73949:0:(mgc_request.c:598:do_requeue()) failed processing log: -5
> Sep 26 11:37:39 AC922 kernel: LustreError: 15c-8:
> MGClustre_server_address at o2ib: Confguration from log testfs-client
> failed from MGS -5. Communication error between node & MGS, a bad
> configuration, or other errors. See syslog for more info
> Sep 26 11:37:39 AC922 kernel: Lustre: Unmounted testfs-client
> Sep 26 11:37:39 AC922 kernel: LustreError:
> 73939:0:(obd_mount.c:1669:lustre_fill_super()) Unable to mount  (-5)
>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
-- 


Consultar el aviso de privacidad en:
http://www.sinergiasys.com/aviso-de-privacidad/
<http://www.sinergiasys.com/aviso-de-privacidad/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pEpkey.asc
Type: application/pgp-keys
Size: 1809 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20191008/9996d70b/attachment.key>


More information about the lustre-discuss mailing list