[Lustre-discuss] What's the human translation for: ost_write operation failed with -28

Rappleye, Jason (ARC-TN)[Computer Sciences Corporation] jason.rappleye at nasa.gov
Mon Dec 5 17:34:01 PST 2011


Hi Thomas,

The OSTs for which you're receiving ENOSPC might have an excessive amount
of grant space outstanding. You can get the current grant space by running
the following command on each OSS:

$ lctl get_param obdfilter.*.tot_granted

Units are in bytes.

One grant-related BZ that that bit us hard is 22755; in particular the
part that caused grant to grow when a user code continued trying to write
even after write(2) started returning EDQUOTA :-(

We monitor and alert on high grant space usage on each OST, so we can
avoid ENOSPC due to this issue.

Jason

--
Jason Rappleye
Systems Administrator
NASA Advanced Supercomputing Division





On 12/5/11 5:05 PM, "Thomas Guthmann" <tguthmann at iseek.com.au> wrote:

>Hi,
>
>> # grep 28 /usr/include/asm-generic/errno-base.h
>> #define ENOSPC 28 /* No space left on device */
>Great. So it's really what's happening. But I have free space/inodes...
>I cannot remember anything in the documentation talking about 'reserved
>free space'.
>
>So based on the following output, is it normal to have no space left on
>storage ?
>
># lfs df -h
>[..]
>UUID                       bytes        Used   Available Use% Mounted on
>foobar-MDT0000_UUID          4.1G      197.8M        3.7G   4%
>/lustre/foobar[MDT:0]
>foobar-OST0000_UUID          2.0T        1.8T       21.1G  93%
>/lustre/foobar[OST:0]
>foobar-OST0001_UUID          2.0T        1.8T       23.2G  93%
>/lustre/foobar[OST:1]
>foobar-OST0002_UUID          2.0T        1.8T       21.4G  93%
>/lustre/foobar[OST:2]
>foobar-OST0003_UUID          2.0T        1.8T       19.3G  93%
>/lustre/foobar[OST:3]
>foobar-OST0004_UUID          2.0T        1.8T       19.3G  93%
>/lustre/foobar[OST:4]
>foobar-OST0005_UUID          2.0T        1.9T       16.9G  94%
>/lustre/foobar[OST:5]
>
># lfs df -i
>[..]
>UUID                      Inodes       IUsed       IFree IUse% Mounted on
>foobar-MDT0000_UUID       1019403          64     1019339   0%
>/lustre/foobar[MDT:0]
>foobar-OST0000_UUID      32363906         102    32363804   0%
>/lustre/foobar[OST:0]
>foobar-OST0001_UUID      32920407          99    32920308   0%
>/lustre/foobar[OST:1]
>foobar-OST0002_UUID      32453038         100    32452938   0%
>/lustre/foobar[OST:2]
>foobar-OST0003_UUID      31904762         104    31904658   0%
>/lustre/foobar[OST:3]
>foobar-OST0004_UUID      31904338         103    31904235   0%
>/lustre/foobar[OST:4]
>foobar-OST0005_UUID      31280099         104    31279995   0%
>/lustre/foobar[OST:5]
>
>For my dmesg on the OSS, Heiko pointed it out (in a private email) that I
>may have hit one of the following bottlenecks :
>- To little space left on file system
>- Performance of ext3/4 on large disks (Note: I am using
>ext4/lustre1.8.5/centos5)
>==> http://jira.whamcloud.com/browse/LU-15.
>
>But it still does not explain why I couldn't write anymore.
>
>Cheers
>Thomas
>
>Any ide
>_______________________________________________
>Lustre-discuss mailing list
>Lustre-discuss at lists.lustre.org
>http://lists.lustre.org/mailman/listinfo/lustre-discuss




More information about the lustre-discuss mailing list