[Lustre-discuss] What's the human translation for: ost_write operation failed with -28

Johann Lombardi johann at whamcloud.com
Tue Dec 6 00:36:13 PST 2011


On Tue, Dec 06, 2011 at 01:31:24AM -0600, Rappleye, Jason  (ARC-TN)[Computer Sciences Corporation] wrote:
> > So REAL_FREESPACE = DF_FREESPACE - TOT_GRANTED ? Correct ?
> 
> That's more or less how our monitoring tools interpret it; a knowledgeable Lustre engineer might chime in and say otherwise :-)

Please note that the space accounted in tot_granted is not *totally* unusable since this space can still be consumed on clients by asynchronous writes.
Actually, the main problem with grant is that there is no callback mechanism yet to reclaim the space granted to clients. In 1.8.1, we introduced a feature called "grant shrinking" which forces idle clients to release grant space after some time. However, this feature was disabled before GA because of some issues in the patch which have never been addressed since.

> > FYI, I have the following values on the OSS it couldn't connect/write to :
> > 
> > obdfilter.foobar-OST0003.tot_granted=17429659648
> > obdfilter.foobar-OST0004.tot_granted=13648875520
> > obdfilter.foobar-OST0005.tot_granted=18136141824

By default, one single OSC should not own more than 32MB of grant space. With 18GB of total granted space, you should have ~560 clients. How many clients are mounting the filesystem?

> >> One grant-related BZ that that bit us hard is 22755; in particular the

Indeed, this grant leak issue has unfortunately been hit by many customers.

> >> part that caused grant to grow when a user code continued trying to write
> >> even after write(2) started returning EDQUOTA :-(
> > That's interesting information. I also found the same via [1] and apparently
> > it may not be fixed overall. Which may explain why I may have hit it with Lustre
> > 1.8.5.

This particular bug is supposed to be fixed since 1.8.4.

> > But, again, my application was writing into sparse files so the space was
> > already allocated... and the sparse files haven't grown.

Lustre (like most filesystems) does not allocate blocks for "holes" in sparse files.

Cheers,
Johann
-- 
Johann Lombardi
Whamcloud, Inc.
www.whamcloud.com



More information about the lustre-discuss mailing list