[Lustre-discuss] What's the human translation for: ost_write operation failed with -28
Rappleye, Jason (ARC-TN)[Computer Sciences Corporation]
jason.rappleye at nasa.gov
Mon Dec 5 17:34:01 PST 2011
Hi Thomas,
The OSTs for which you're receiving ENOSPC might have an excessive amount
of grant space outstanding. You can get the current grant space by running
the following command on each OSS:
$ lctl get_param obdfilter.*.tot_granted
Units are in bytes.
One grant-related BZ that that bit us hard is 22755; in particular the
part that caused grant to grow when a user code continued trying to write
even after write(2) started returning EDQUOTA :-(
We monitor and alert on high grant space usage on each OST, so we can
avoid ENOSPC due to this issue.
Jason
--
Jason Rappleye
Systems Administrator
NASA Advanced Supercomputing Division
On 12/5/11 5:05 PM, "Thomas Guthmann" <tguthmann at iseek.com.au> wrote:
>Hi,
>
>> # grep 28 /usr/include/asm-generic/errno-base.h
>> #define ENOSPC 28 /* No space left on device */
>Great. So it's really what's happening. But I have free space/inodes...
>I cannot remember anything in the documentation talking about 'reserved
>free space'.
>
>So based on the following output, is it normal to have no space left on
>storage ?
>
># lfs df -h
>[..]
>UUID bytes Used Available Use% Mounted on
>foobar-MDT0000_UUID 4.1G 197.8M 3.7G 4%
>/lustre/foobar[MDT:0]
>foobar-OST0000_UUID 2.0T 1.8T 21.1G 93%
>/lustre/foobar[OST:0]
>foobar-OST0001_UUID 2.0T 1.8T 23.2G 93%
>/lustre/foobar[OST:1]
>foobar-OST0002_UUID 2.0T 1.8T 21.4G 93%
>/lustre/foobar[OST:2]
>foobar-OST0003_UUID 2.0T 1.8T 19.3G 93%
>/lustre/foobar[OST:3]
>foobar-OST0004_UUID 2.0T 1.8T 19.3G 93%
>/lustre/foobar[OST:4]
>foobar-OST0005_UUID 2.0T 1.9T 16.9G 94%
>/lustre/foobar[OST:5]
>
># lfs df -i
>[..]
>UUID Inodes IUsed IFree IUse% Mounted on
>foobar-MDT0000_UUID 1019403 64 1019339 0%
>/lustre/foobar[MDT:0]
>foobar-OST0000_UUID 32363906 102 32363804 0%
>/lustre/foobar[OST:0]
>foobar-OST0001_UUID 32920407 99 32920308 0%
>/lustre/foobar[OST:1]
>foobar-OST0002_UUID 32453038 100 32452938 0%
>/lustre/foobar[OST:2]
>foobar-OST0003_UUID 31904762 104 31904658 0%
>/lustre/foobar[OST:3]
>foobar-OST0004_UUID 31904338 103 31904235 0%
>/lustre/foobar[OST:4]
>foobar-OST0005_UUID 31280099 104 31279995 0%
>/lustre/foobar[OST:5]
>
>For my dmesg on the OSS, Heiko pointed it out (in a private email) that I
>may have hit one of the following bottlenecks :
>- To little space left on file system
>- Performance of ext3/4 on large disks (Note: I am using
>ext4/lustre1.8.5/centos5)
>==> http://jira.whamcloud.com/browse/LU-15.
>
>But it still does not explain why I couldn't write anymore.
>
>Cheers
>Thomas
>
>Any ide
>_______________________________________________
>Lustre-discuss mailing list
>Lustre-discuss at lists.lustre.org
>http://lists.lustre.org/mailman/listinfo/lustre-discuss
More information about the lustre-discuss
mailing list