[Lustre-discuss] OST node filling up and aborting write

Nick Jennings nick at creativemotiondesign.com
Sat Feb 28 06:22:44 PST 2009


Hi Brian,

  (Thanks for pointing out the -1 as opposed to 1, I missed that)

Brian J. Murrell wrote:
> On Sat, 2009-02-28 at 02:34 +0100, Nick Jennings wrote:
>> Hi Everyone,
> 
> Hi Nick,
> 
>>   I've got 4 OSTs (each 2gigs in size) on one lustre file system. I dd a 
>> 4 gig file to the filesystem and after the first OST fills up, the write 
>> fails (not enough space on device):
> 
> Writes to do not "cascade" over to another OST when one fills up.

I see. I guess I have a misunderstanding of the way striping works.

If you set the stripesize=1MB, and stripecount=-1 - Then I would assume 
this means: Split each write process into 1MB chunks, stripe across all 
OSTs. By write process I mean 1 single file being written to disk. I've 
read over Chapter 25 as well but it doesn't seem to clarify this for me 
(I'm probably letting something fly over my head).


>> I initially thought this could be solved by enabling striping, but from 
>> HowTo (which doesn't say much on the subject admittedly) I gathered 
>> striping was already enabled?
> 
> No.  By default, stripesize == 1.  In order to get a single file onto
> multiple OSTs you will need to explicitly set a striping policy either
> on the file you are going to write into or the directory the file is in.

Then what is stripesize=-1 used for? (when specified for the filesystem, 
and not a file or a directory). Can you give me an example?



--
  Write Test #2
--

# lctl conf_param testfs-MDT0000.lov.stripecount=-1

/proc/fs/lustre/lov/testfs-clilov-c464c000/stripecount:-1
/proc/fs/lustre/lov/testfs-clilov-c464c000/stripeoffset:0
/proc/fs/lustre/lov/testfs-clilov-c464c000/stripesize:1048576
/proc/fs/lustre/lov/testfs-clilov-c464c000/stripetype:1
/proc/fs/lustre/lov/testfs-mdtlov/stripecount:-1
/proc/fs/lustre/lov/testfs-mdtlov/stripeoffset:0
/proc/fs/lustre/lov/testfs-mdtlov/stripesize:1048576
/proc/fs/lustre/lov/testfs-mdtlov/stripetype:1

# dd if=/dev/zero of=/mnt/testfs/testfile1 bs=4096 count=614400
dd: writing `/mnt/testfs/testfile1': No space left on device
437506+0 records in
437505+0 records out
1792020480 bytes (1.8 GB) copied, 52.5727 seconds, 34.1 MB/s

# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/hda1              15G  7.7G  5.9G  57% /
tmpfs                 252M     0  252M   0% /dev/shm
/dev/hda5             4.1G  198M  3.7G   6% /mnt/lustre/mdt
/dev/hda6             1.9G  1.8G   68K 100% /mnt/lustre/ost0
/dev/hda7             1.9G   80M  1.7G   5% /mnt/lustre/ost1
/dev/hda8             1.9G   80M  1.7G   5% /mnt/lustre/ost2
/dev/hda9             1.9G   80M  1.7G   5% /mnt/lustre/ost3
192.168.0.149 at tcp0:/testfs
                       7.4G  2.0G  5.1G  29% /mnt/testfs



Thanks for your help,
-Nick




More information about the lustre-discuss mailing list