[lustre-discuss] Lustre Filesystem mounted but having "Input/output error" on df command

Mohr Jr, Richard Frank (Rick Mohr) rmohr at utk.edu
Mon Sep 10 06:42:39 PDT 2018


Those are the kind of symptoms you would see if the client is able to connect to the MDS server but not to an OSS server.  Certain operations (mount, cd, ls) will work if the MDS server is reachable , even if one or more OSS servers is not reachable.  But other operations (“ls -la”, df) require info from the OSS servers, so those operations would hang.

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu


> On Sep 8, 2018, at 8:33 AM, fırat yılmaz <firatyilmazz at gmail.com> wrote:
> 
> Hi There,
> 
> OS=Centos 7.4
> Lustre Version: Intel® Manager for Lustre* software 4.0.3.0
> İnterconnect: Mellanox OFED, ConnectX-5
> 
> In one of my lustre client i have Input/output error in df command, i am unable to see the lustre mount point in df but mtab file shows that lustre is mounted
> 
> df -h output:
> 
> df: ‘/home’: Input/output error
> df: ‘/vol1’: Input/output error
> df: ‘/cm/shared’: Input/output error
> Filesystem        Size  Used Avail Use% Mounted on
> 
>  cat /etc/mtab |grep lustre
> 
> 10.51.22.11 at o2ib:10.51.22.10 at o2ib:/lustre/home /home lustre rw,flock,lazystatfs 0 0
> 10.51.22.11 at o2ib:10.51.22.10 at o2ib:/lustre /vol1 lustre rw,flock,lazystatfs 0 0
> 10.51.22.11 at o2ib:10.51.22.10 at o2ib:/lustre/cmshared /cm/shared lustre rw,flock,lazystatfs 0 0
> 
> 
> df -h output:
> 
> df: ‘/home’: Input/output error
> df: ‘/vol1’: Input/output error
> df: ‘/cm/shared’: Input/output error
> Filesystem        Size  Used Avail Use% Mounted on
> 
> 
> When i cd to the mounted point i can reach the lustre filesystem, i can create and delete files and folders. But when i cd to a large fileand run ls -lah command, response from the lustre client freezes.
> 
> dmesg output:
>  [84276.460557] Lustre: 5617:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1536408434/real 1536408489]  req at ffff882f31697800 x1610952588839712/t0(0) o8->lustre-OST0016-osc-ffff885f5fa1f000 at 10.52.23.5@o2ib:28/4 lens 520/544 e 0 to 1 dl 1536408714 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1
> [84276.460565] Lustre: 5617:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 910 previous similar messages
> [84386.986467] LustreError: 122750:0:(llite_lib.c:1772:ll_statfs_internal()) obd_statfs fails: rc = -5
> [84386.986471] LustreError: 122750:0:(llite_lib.c:1772:ll_statfs_internal()) Skipped 29 previous similar messages
> [84704.429967] LNet: 5429:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed out tx for 10.52.23.5 at o2ib: 4379575 seconds
> [84704.429970] LNet: 5429:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Skipped 863 previous similar messages
> [84881.004949] Lustre: 5617:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1536409034/real 1536409095]  req at ffff882f2a6e5700 x1610952588854608/t0(0) o8->lustre-OST002e-osc-ffff885f5fa1f000 at 10.52.23.5@o2ib:28/4 lens 520/544 e 0 to 1 dl 1536409314 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1
> [84881.004957] Lustre: 5617:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 863 previous similar messages
> [85065.953686] LustreError: 123635:0:(llite_lib.c:1772:ll_statfs_internal()) obd_statfs fails: rc = -5
> [85065.953689] LustreError: 123635:0:(llite_lib.c:1772:ll_statfs_internal()) Skipped 26 previous similar messages
> 
> fstab mount options:
> lustre       flock,_netdev,x-systemd.requires=lnet.service 0 0
> 
> ib_* benchmark tests are as usual.
> 
> Where should i check?
> 
> Best Regards.
> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org




More information about the lustre-discuss mailing list