[lustre-discuss] lctl ping node28 at o2ib report Input/output error

yu sun sunyu1949 at gmail.com
Tue Jun 26 21:52:22 PDT 2018


thanks robin, sorry for my later reply.

[root at bigdata-dlp-server00 ~]# salt "ml-storage-ser2[0-9].nmg01" cmd.run
"lctl list_nids"
ml-storage-ser28.nmg01: (node28)
    10.82.143.202 at o2ib1
    10.83.162.19 at tcp1
ml-storage-ser25.nmg01: (node25)
    10.83.162.16 at tcp1
    10.82.143.199 at o2ib1
ml-storage-ser20.nmg01: (node20)
    10.82.143.194 at o2ib1
    10.83.162.11 at tcp1
ml-storage-ser24.nmg01:(node24)
    10.82.143.198 at o2ib1
    10.83.162.15 at tcp1
ml-storage-ser29.nmg01:(node29)
    10.83.162.20 at tcp1
    10.82.143.203 at o2ib1
ml-storage-ser22.nmg01: (node22)
    10.82.143.196 at o2ib1
    10.83.162.13 at tcp1
ml-storage-ser27.nmg01: (node27)
    10.83.162.18 at tcp1
    10.82.143.201 at o2ib1
ml-storage-ser23.nmg01: (node23)
    10.83.162.14 at tcp1
    10.82.143.197 at o2ib1
ml-storage-ser26.nmg01: (node26)
    10.82.143.200 at o2ib1
    10.83.162.17 at tcp1
ml-storage-ser21.nmg01: (node21)
    10.83.162.12 at tcp1
    10.82.143.195 at o2ib1

root at ml-gpu-ser200.nmg01:~$ lctl list_nids
10.82.141.208 at o2ib1
10.83.152.55 at tcp1
root at ml-gpu-ser200.nmg01:~$ lctl ping node28 at o2ib1
failed to ping 10.82.143.202 at o2ib1: Input/output error
root at ml-gpu-ser200.nmg01:~$

I have create file /etc/modprobe.d/lustre.conf with content on all mdt ost
and client:
root at ml-gpu-ser200.nmg01:~$ cat /etc/modprobe.d/lustre.conf
options lnet networks="o2ib1(eth3.2)"
and I exec command line : lnetctl lnet configure --all to make my static
lnet configuration take effect. but i still can't ping node28 from my
client ml-gpu-ser200.nmg01.   I can mount  as well as access lustre on
 client ml-gpu-ser200.nmg01.

And I can lctl ping node28 at o2ib successfully from other mdt or ost nodes,
such as:
root at ml-storage-ser26.nmg01:/home/odin/sunyuyusun$ lctl ping node28 at o2ib1
12345-0 at lo
12345-10.82.143.202 at o2ib1
12345-10.83.162.19 at tcp1
root at ml-storage-ser26.nmg01:/home/odin/sunyuyusun$ lctl ping node20 at o2ib1
12345-0 at lo
12345-10.82.143.194 at o2ib1
12345-10.83.162.11 at tcp1
root at ml-storage-ser26.nmg01:/home/odin/sunyuyusun$ lctl ping node21 at o2ib1
12345-0 at lo
12345-10.83.162.12 at tcp1
12345-10.82.143.195 at o2ib1
root at ml-storage-ser26.nmg01:/home/odin/sunyuyusun$ lctl ping node22 at o2ib1
12345-0 at lo
12345-10.82.143.196 at o2ib1
12345-10.83.162.13 at tcp1

so what lnet configuration should I set to solve this problem?

Thanks very much .
Yours
Yu

Robin Humble <rjh+lustre at cita.utoronto.ca> 于2018年6月26日周二 下午10:48写道:

> On Tue, Jun 26, 2018 at 04:05:14PM +0800, yu sun wrote:
> >hi all:
> >     I want to build a lustre storage system, and I found not all of the
> >machine in the same sub-network, and they cant lctl ping with each other.
> >the details list as below:
> >
> >root at ml-storage-ser30.nmg01:~$ lctl list_nids
> >10.82.145.2 at o2ib
> >root at ml-storage-ser30.nmg01:~$ lctl ping node28 at o2ib
> >failed to ping 10.82.143.202 at o2ib: Input/output error
> >root at ml-storage-ser30.nmg01:~$
>
> what does 'lctl list_nids' say on node28?
> also disable iptables everywhere.
>
> cheers,
> robin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180627/0989ab3e/attachment-0001.html>


More information about the lustre-discuss mailing list