[Lustre-discuss] full ost

Sun Jan 31 10:46:23 PST 2010

On 2010-01-29, at 03:07, Johann Lombardi wrote:
> On Fri, Jan 29, 2010 at 10:32:26AM +0100, gvozden rovina wrote:
>> OST. For instance i copied 2.5 GB file to lustre which had 120 GB  
>> storage
>> space (I have 2GB test OSTs) and it didn't automatically recognized  
>> full
>> OST but it simply stopped working with " No space left on device"  
>> error
>> message. There was plenty of space left on filesystem (cca 100GB).
>
> The mds monitors OST disk usage by regularly sending OST_STATFS rpcs  
> and
> it won't allocate *new* files on OSTs that are full. This means that  
> you
> don't need to put full OSTs offline on the MDS, those OSTs will be  
> skipped
> automatically at file creation time.

Note also that we expect OSTs to be configured with a MUCH larger size  
than 2GB.  Typical is 8TB, and in the near future 16TB OSTs will be  
possible.  The object allocation policy assumes that the individual  
file size is smaller than single OSTs, and for extremely large files  
(i.e. multi-TB) the user can set the striping for the file wide enough  
to have sufficient space.

For applications that don't know how many stripes to use, it is also  
possible to have the MDS compute this based on the expected file size  
(assuming the application knows this) and the current OST space  
availability:

      mknod({lustre_filename|, S_IFREG, {file_perms});
      truncate({lustre_filename}, {expected_size}
      open({lustre_filename}, {open_mode}

When the file is opened, it will be striped widely enough to allow  
{expected_size} to be written to it, assuming there is enough space on  
each OST such that:

     min_ost_free * num_stripes >= expected_size

This doesn't actually _reserve_ that space, so if multiple nodes are  
writing huge files and there isn't enough space in the filesystem, you  
can still run out of space.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.