[lustre-discuss] Lustre Filesystem mounted but having "Input/output error" on df command

Saravanaraj Ayyampalayam ansraj at gmail.com
Sat Sep 8 16:58:36 PDT 2018


Hi,

Looks like you can’t connect to 10.52.23.5 at o2ib server node.
You should start by checking that the infiniband is working on that server node. Do a regular ping from the client node to the server node.
You can then run a lctl ping to see if the lnet network is working.
lctl ping 10.52.23.5 at o2ib

Check the /var/log/messages on all the lustre server nodes. See if there are any errors reported there.
Couple of days ago I had a similar issue and was seeing page allocation failures in my /var/log/messages file on my OSS server nodes.

Hope this helps.

-Raj

> On Sep 8, 2018, at 8:33 AM, fırat yılmaz <firatyilmazz at gmail.com> wrote:
> 
> Hi There,
> 
> OS=Centos 7.4
> Lustre Version: Intel® Manager for Lustre* software 4.0.3.0
> İnterconnect: Mellanox OFED, ConnectX-5
> 
> In one of my lustre client i have Input/output error in df command, i am unable to see the lustre mount point in df but mtab file shows that lustre is mounted
> 
> df -h output:
> 
> df: ‘/home’: Input/output error
> df: ‘/vol1’: Input/output error
> df: ‘/cm/shared’: Input/output error
> Filesystem        Size  Used Avail Use% Mounted on
> 
>  cat /etc/mtab |grep lustre
> 
> 10.51.22.11 at o2ib:10.51.22.10 at o2ib:/lustre/home /home lustre rw,flock,lazystatfs 0 0
> 10.51.22.11 at o2ib:10.51.22.10 at o2ib:/lustre /vol1 lustre rw,flock,lazystatfs 0 0
> 10.51.22.11 at o2ib:10.51.22.10 at o2ib:/lustre/cmshared /cm/shared lustre rw,flock,lazystatfs 0 0
> 
> 
> df -h output:
> 
> df: ‘/home’: Input/output error
> df: ‘/vol1’: Input/output error
> df: ‘/cm/shared’: Input/output error
> Filesystem        Size  Used Avail Use% Mounted on
> 
> 
> When i cd to the mounted point i can reach the lustre filesystem, i can create and delete files and folders. But when i cd to a large fileand run ls -lah command, response from the lustre client freezes.
> 
> dmesg output:
>  [84276.460557] Lustre: 5617:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1536408434/real 1536408489]  req at ffff882f31697800 x1610952588839712/t0(0) o8->lustre-OST0016-osc-ffff885f5fa1f000 at 10.52.23.5@o2ib:28/4 lens 520/544 e 0 to 1 dl 1536408714 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1
> [84276.460565] Lustre: 5617:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 910 previous similar messages
> [84386.986467] LustreError: 122750:0:(llite_lib.c:1772:ll_statfs_internal()) obd_statfs fails: rc = -5
> [84386.986471] LustreError: 122750:0:(llite_lib.c:1772:ll_statfs_internal()) Skipped 29 previous similar messages
> [84704.429967] LNet: 5429:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed out tx for 10.52.23.5 at o2ib: 4379575 seconds
> [84704.429970] LNet: 5429:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Skipped 863 previous similar messages
> [84881.004949] Lustre: 5617:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1536409034/real 1536409095]  req at ffff882f2a6e5700 x1610952588854608/t0(0) o8->lustre-OST002e-osc-ffff885f5fa1f000 at 10.52.23.5@o2ib:28/4 lens 520/544 e 0 to 1 dl 1536409314 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1
> [84881.004957] Lustre: 5617:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 863 previous similar messages
> [85065.953686] LustreError: 123635:0:(llite_lib.c:1772:ll_statfs_internal()) obd_statfs fails: rc = -5
> [85065.953689] LustreError: 123635:0:(llite_lib.c:1772:ll_statfs_internal()) Skipped 26 previous similar messages
> 
> fstab mount options:
> lustre       flock,_netdev,x-systemd.requires=lnet.service 0 0
> 
> ib_* benchmark tests are as usual.
> 
> Where should i check?
> 
> Best Regards.
> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



More information about the lustre-discuss mailing list