[Lustre-discuss] What's the human translation for: ost_write operation failed with -28
Rappleye, Jason (ARC-TN)[Computer Sciences Corporation]
jason.rappleye at nasa.gov
Mon Dec 5 23:31:24 PST 2011
On Dec 5, 2011, at 10:31 PM, Thomas Guthmann wrote:
> Hi Jason,
>> $ lctl get_param obdfilter.*.tot_granted
>> Units are in bytes.
> Thanks. I wasn't aware of this "grant". I googled for it and I found some
> information about it but it's still unclear. Should I understand that the
> value in obdfilter.*.tot_granted are actually 'reserved' space allocated
> by clients but not used ?
In a sense, yes. My understanding is that grant space exists to ensure that client applications can perform asynchronous writes without dirtying more pages than the available space on an OST. Otherwise, writes would have to be synchronous to ensure that clients didn't use more space than is available.
> So REAL_FREESPACE = DF_FREESPACE - TOT_GRANTED ? Correct ?
That's more or less how our monitoring tools interpret it; a knowledgeable Lustre engineer might chime in and say otherwise :-)
> FYI, I have the following values on the OSS it couldn't connect/write to :
> and : lfs df (seen from the client)
> foobar-OST0003_UUID 2113787824 1986169192 20244388 93% /lustre/foobar[OST:3]
> foobar-OST0004_UUID 2113787824 1986170884 20242696 93% /lustre/foobar[OST:4]
> foobar-OST0005_UUID 2113787824 1988667844 17745736 94% /lustre/foobar[OST:5]
> So, for instance for OST5 I have 17745736 - (18136141824/1024) = ...
> 17745736 - 17711076 = 34660 KB left
> Am I right ?
Yes, though on our system with ~12,000 clients, those values of tot_granted are obscenely low. A better comparison would be tot_granted on a freshly mounted OST on your filesystem.
>> One grant-related BZ that that bit us hard is 22755; in particular the
>> part that caused grant to grow when a user code continued trying to write
>> even after write(2) started returning EDQUOTA :-(
> That's interesting information. I also found the same via  and apparently
> it may not be fixed overall. Which may explain why I may have hit it with Lustre
> But, again, my application was writing into sparse files so the space was
> already allocated... and the sparse files haven't grown.
Your specific problem may not be due to a bug. That last bit of the filesystem may not be easily usable due to the grant mechanism. I'll let someone with more knowledge about grants chime in here.
Also, as Heiko alluded to, running with an OST so full is going to increase the chance of exposure to problems described in LU-15.
> : http://firstname.lastname@example.org/msg07565.html
>> On 12/5/11 5:05 PM, "Thomas Guthmann"<tguthmann at iseek.com.au> wrote:
>>>> # grep 28 /usr/include/asm-generic/errno-base.h
>>>> #define ENOSPC 28 /* No space left on device */
>>> Great. So it's really what's happening. But I have free space/inodes...
>>> I cannot remember anything in the documentation talking about 'reserved
>>> free space'.
>>> So based on the following output, is it normal to have no space left on
>>> storage ?
>>> # lfs df -h
>>> UUID bytes Used Available Use% Mounted on
>>> foobar-MDT0000_UUID 4.1G 197.8M 3.7G 4%
>>> foobar-OST0000_UUID 2.0T 1.8T 21.1G 93%
>>> foobar-OST0001_UUID 2.0T 1.8T 23.2G 93%
>>> foobar-OST0002_UUID 2.0T 1.8T 21.4G 93%
>>> foobar-OST0003_UUID 2.0T 1.8T 19.3G 93%
>>> foobar-OST0004_UUID 2.0T 1.8T 19.3G 93%
>>> foobar-OST0005_UUID 2.0T 1.9T 16.9G 94%
>>> # lfs df -i
>>> UUID Inodes IUsed IFree IUse% Mounted on
>>> foobar-MDT0000_UUID 1019403 64 1019339 0%
>>> foobar-OST0000_UUID 32363906 102 32363804 0%
>>> foobar-OST0001_UUID 32920407 99 32920308 0%
>>> foobar-OST0002_UUID 32453038 100 32452938 0%
>>> foobar-OST0003_UUID 31904762 104 31904658 0%
>>> foobar-OST0004_UUID 31904338 103 31904235 0%
>>> foobar-OST0005_UUID 31280099 104 31279995 0%
>>> For my dmesg on the OSS, Heiko pointed it out (in a private email) that I
>>> may have hit one of the following bottlenecks :
>>> - To little space left on file system
>>> - Performance of ext3/4 on large disks (Note: I am using
>>> ==> http://jira.whamcloud.com/browse/LU-15.
>>> But it still does not explain why I couldn't write anymore.
>>> Any ide
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
More information about the lustre-discuss