[Lustre-discuss] What's the human translation for: ost_write operation failed with -28
Thomas Guthmann
tguthmann at iseek.com.au
Mon Dec 5 22:31:46 PST 2011
Hi Jason,
> $ lctl get_param obdfilter.*.tot_granted
> Units are in bytes.
Thanks. I wasn't aware of this "grant". I googled for it and I found some
information about it but it's still unclear. Should I understand that the
value in obdfilter.*.tot_granted are actually 'reserved' space allocated
by clients but not used ?
So REAL_FREESPACE = DF_FREESPACE - TOT_GRANTED ? Correct ?
FYI, I have the following values on the OSS it couldn't connect/write to :
obdfilter.foobar-OST0003.tot_granted=17429659648
obdfilter.foobar-OST0004.tot_granted=13648875520
obdfilter.foobar-OST0005.tot_granted=18136141824
and : lfs df (seen from the client)
foobar-OST0003_UUID 2113787824 1986169192 20244388 93% /lustre/foobar[OST:3]
foobar-OST0004_UUID 2113787824 1986170884 20242696 93% /lustre/foobar[OST:4]
foobar-OST0005_UUID 2113787824 1988667844 17745736 94% /lustre/foobar[OST:5]
So, for instance for OST5 I have 17745736 - (18136141824/1024) = ...
17745736 - 17711076 = 34660 KB left
Am I right ?
> One grant-related BZ that that bit us hard is 22755; in particular the
> part that caused grant to grow when a user code continued trying to write
> even after write(2) started returning EDQUOTA :-(
That's interesting information. I also found the same via [1] and apparently
it may not be fixed overall. Which may explain why I may have hit it with Lustre
1.8.5.
But, again, my application was writing into sparse files so the space was
already allocated... and the sparse files haven't grown.
[1]: http://www.mail-archive.com/lustre-discuss@lists.lustre.org/msg07565.html
Thomas
>
> On 12/5/11 5:05 PM, "Thomas Guthmann"<tguthmann at iseek.com.au> wrote:
>
>> Hi,
>>
>>> # grep 28 /usr/include/asm-generic/errno-base.h
>>> #define ENOSPC 28 /* No space left on device */
>> Great. So it's really what's happening. But I have free space/inodes...
>> I cannot remember anything in the documentation talking about 'reserved
>> free space'.
>>
>> So based on the following output, is it normal to have no space left on
>> storage ?
>>
>> # lfs df -h
>> [..]
>> UUID bytes Used Available Use% Mounted on
>> foobar-MDT0000_UUID 4.1G 197.8M 3.7G 4%
>> /lustre/foobar[MDT:0]
>> foobar-OST0000_UUID 2.0T 1.8T 21.1G 93%
>> /lustre/foobar[OST:0]
>> foobar-OST0001_UUID 2.0T 1.8T 23.2G 93%
>> /lustre/foobar[OST:1]
>> foobar-OST0002_UUID 2.0T 1.8T 21.4G 93%
>> /lustre/foobar[OST:2]
>> foobar-OST0003_UUID 2.0T 1.8T 19.3G 93%
>> /lustre/foobar[OST:3]
>> foobar-OST0004_UUID 2.0T 1.8T 19.3G 93%
>> /lustre/foobar[OST:4]
>> foobar-OST0005_UUID 2.0T 1.9T 16.9G 94%
>> /lustre/foobar[OST:5]
>>
>> # lfs df -i
>> [..]
>> UUID Inodes IUsed IFree IUse% Mounted on
>> foobar-MDT0000_UUID 1019403 64 1019339 0%
>> /lustre/foobar[MDT:0]
>> foobar-OST0000_UUID 32363906 102 32363804 0%
>> /lustre/foobar[OST:0]
>> foobar-OST0001_UUID 32920407 99 32920308 0%
>> /lustre/foobar[OST:1]
>> foobar-OST0002_UUID 32453038 100 32452938 0%
>> /lustre/foobar[OST:2]
>> foobar-OST0003_UUID 31904762 104 31904658 0%
>> /lustre/foobar[OST:3]
>> foobar-OST0004_UUID 31904338 103 31904235 0%
>> /lustre/foobar[OST:4]
>> foobar-OST0005_UUID 31280099 104 31279995 0%
>> /lustre/foobar[OST:5]
>>
>> For my dmesg on the OSS, Heiko pointed it out (in a private email) that I
>> may have hit one of the following bottlenecks :
>> - To little space left on file system
>> - Performance of ext3/4 on large disks (Note: I am using
>> ext4/lustre1.8.5/centos5)
>> ==> http://jira.whamcloud.com/browse/LU-15.
>>
>> But it still does not explain why I couldn't write anymore.
>>
>> Cheers
>> Thomas
>>
>> Any ide
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
More information about the lustre-discuss
mailing list