[lustre-discuss] lctl ping node28 at o2ib report Input/output error
Mohr Jr, Richard Frank (Rick Mohr)
rmohr at utk.edu
Thu Jun 28 12:29:58 PDT 2018
> On Jun 27, 2018, at 4:44 PM, Mohr Jr, Richard Frank (Rick Mohr) <rmohr at utk.edu> wrote:
>> On Jun 27, 2018, at 3:12 AM, yu sun <sunyu1949 at gmail.com> wrote:
>> root at ml-gpu-ser200.nmg01:~$ mount -t lustre node28 at o2ib1:node29 at o2ib1:/project /mnt/lustre_data
>> mount.lustre: mount node28 at o2ib1:node29 at o2ib1:/project at /mnt/lustre_data failed: Input/output error
>> Is the MGS running?
>> root at ml-gpu-ser200.nmg01:~$ lctl ping node28 at o2ib1
>> failed to ping 10.82.143.202 at o2ib1: Input/output error
>> root at ml-gpu-ser200.nmg01:~$
> In your previous email, you said that you could mount lustre on the client ml-gpu-ser200.nmg01. Was that not accurate, or did something change in the meantime?
(Note: Received out-of-band reply from Yu stating that there was a typo in the previous email, and that client ml-gpu-ser200.nmg01 could not mount lustre. Continuing discussion here so others on list can follow/benefit.)
For the IPoIB addresses used on your nodes, what are the subnets (and netmasks) that you are using? It looks like servers use 10.82.143.X and clients use 10.82.141.X. If you are using a 255.255.0.0 netmask, you should be fine. But if you are using 255.255.255.0, then you will run into problems. Lustre expects that all nodes on the same lnet network (o2ib1 in your case) will also be on the same IP subnet.
Have you tried running a regular “ping <IPoIB_address>” command between clients and servers to make sure that part is working?
Senior HPC System Administrator
National Institute for Computational Sciences
More information about the lustre-discuss