[Lustre-discuss] OST node filling up and aborting write
Nick Jennings
nick at creativemotiondesign.com
Sat Feb 28 05:36:01 PST 2009
I just re-formated everything and started from scratch, here is a start
to finish account of the process. I'm following the Lustre Mount Conf
doc found here: http://wiki.lustre.org/index.php?title=Mount_Conf
--
Create MDT / MGS
--
# mkfs.lustre --fsname=testfs --mdt --mgs --reformat /dev/hda5
# mount -t lustre /dev/hda5 /mnt/lustre/mdt/
# cat /proc/fs/lustre/devices
0 UP mgs MGS MGS 5
1 UP mgc MGC192.168.0.149 at tcp c047ce37-72dd-346d-b348-19d50416e195 5
2 UP mdt MDS MDS_uuid 3
3 UP lov testfs-mdtlov testfs-mdtlov_UUID 4
4 UP mds testfs-MDT0000 testfs-MDT0000_UUID 3
--
Format OSTs
--
# mkfs.lustre --fsname=testfs --ost --mgsnode=192.168.0.149 at tcp0
--reformat /dev/hda6
# mkfs.lustre --fsname=testfs --ost --mgsnode=192.168.0.149 at tcp0
--reformat /dev/hda7
# mkfs.lustre --fsname=testfs --ost --mgsnode=192.168.0.149 at tcp0
--reformat /dev/hda8
# mkfs.lustre --fsname=testfs --ost --mgsnode=192.168.0.149 at tcp0
--reformat /dev/hda9
--
Mount OSTs
--
# mount -t lustre /dev/hda6 /mnt/lustre/ost0/
# mount -t lustre /dev/hda7 /mnt/lustre/ost1/
# mount -t lustre /dev/hda8 /mnt/lustre/ost2/
# mount -t lustre /dev/hda9 /mnt/lustre/ost3/
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/hda1 15G 7.7G 5.9G 57% /
tmpfs 252M 0 252M 0% /dev/shm
/dev/hda5 4.1G 198M 3.7G 6% /mnt/lustre/mdt
/dev/hda6 1.9G 80M 1.7G 5% /mnt/lustre/ost0
/dev/hda7 1.9G 80M 1.7G 5% /mnt/lustre/ost1
/dev/hda8 1.9G 80M 1.7G 5% /mnt/lustre/ost2
/dev/hda9 1.9G 80M 1.7G 5% /mnt/lustre/ost3
--
Mount Lustre Filesystem
--
# mount -t lustre 192.168.0.149 at tcp0:/testfs /mnt/testfs/
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/hda1 15G 7.7G 5.9G 57% /
tmpfs 252M 0 252M 0% /dev/shm
/dev/hda5 4.1G 198M 3.7G 6% /mnt/lustre/mdt
/dev/hda6 1.9G 80M 1.7G 5% /mnt/lustre/ost0
/dev/hda7 1.9G 80M 1.7G 5% /mnt/lustre/ost1
/dev/hda8 1.9G 80M 1.7G 5% /mnt/lustre/ost2
/dev/hda9 1.9G 80M 1.7G 5% /mnt/lustre/ost3
192.168.0.149 at tcp0:/testfs
7.4G 317M 6.7G 5% /mnt/testfs
--
Write Test #1
--
# dd if=/dev/zero of=/mnt/testfs/testfile1 bs=4096 count=614400
dd: writing `/mnt/testfs/testfile1': No space left on device
437506+0 records in
437505+0 records out
1792020480 bytes (1.8 GB) copied, 49.9896 seconds, 35.8 MB/s
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/hda1 15G 7.7G 5.9G 57% /
tmpfs 252M 0 252M 0% /dev/shm
/dev/hda5 4.1G 198M 3.7G 6% /mnt/lustre/mdt
/dev/hda6 1.9G 1.8G 68K 100% /mnt/lustre/ost0
/dev/hda7 1.9G 80M 1.7G 5% /mnt/lustre/ost1
/dev/hda8 1.9G 80M 1.7G 5% /mnt/lustre/ost2
/dev/hda9 1.9G 80M 1.7G 5% /mnt/lustre/ost3
192.168.0.149 at tcp0:/testfs
7.4G 2.0G 5.1G 29% /mnt/testfs
# cat /proc/fs/lustre/lov/testfs-*/stripe*
1
0
1048576
1
1
0
1048576
1
Nick Jennings wrote:
> Hi Minh,
>
> Yes, stripecount is set to one:
>
> # cat /proc/fs/lustre/lov/*/stripecount
> 1
> 1
>
> -Nick
>
> Minh Diep wrote:
>> Hi Nick,
>>
>> Have you tried setting stripecount=-1 ?
>>
>> Thanks
>> -Minh
>>
>> Nick Jennings wrote:
>>> Hi Everyone,
>>>
>>> I have a small lustre test machine setup to bring myself back up to
>>> speed as it's been a few years. This is probably a very basic issue
>>> but I'm not able to find documentation on it (maybe I'm looking for
>>> the wrong thing).
>>>
>>> I've got 4 OSTs (each 2gigs in size) on one lustre file system. I dd
>>> a 4 gig file to the filesystem and after the first OST fills up, the
>>> write fails (not enough space on device):
>>>
>>>
>>> # dd of=/mnt/testfs/datafile3 if=/dev/zero bs=1048576 count=4024
>>> dd: writing `/mnt/testfs/testfile3': No space left on device
>>> 1710+0 records in
>>> 1709+0 records out
>>> 1792020480 bytes (1.8 GB) copied, 55.1519 seconds, 32.5 MB/s
>>>
>>> # df -h
>>> Filesystem Size Used Avail Use% Mounted on
>>> /dev/hda1 15G 7.7G 5.9G 57% /
>>> tmpfs 252M 0 252M 0% /dev/shm
>>> /dev/hda5 4.1G 198M 3.7G 6% /mnt/test/mdt
>>> /dev/hda6 1.9G 1.1G 686M 62% /mnt/test/ost0
>>> 192.168.0.149 at tcp:/testfs
>>> 7.4G 4.7G 2.4G 67% /mnt/testfs
>>> /dev/hda7 1.9G 1.8G 68K 100% /mnt/test/ost1
>>> /dev/hda8 1.9G 80M 1.7G 5% /mnt/test/ost2
>>> /dev/hda9 1.9G 1.8G 68K 100% /mnt/test/ost3
>>>
>>>
>>> I did this 2 times, which is why both ost1 and ost3 are full. As you
>>> can see, ost2 and ost0 still have space.
>>>
>>> I initially thought this could be solved by enabling striping, but
>>> from HowTo (which doesn't say much on the subject admittedly) I
>>> gathered striping was already enabled? (4MB chunks). So, shouldn't
>>> these OSTs be filling up at a relatively uniform ratio?
>>>
>>> # cat /proc/fs/lustre/lov/testfs-clilov-ca5e0000/stripe*
>>> 1
>>> 0
>>> 4194304
>>> 1
>>> [root at andy ~]# cat /proc/fs/lustre/lov/testfs-mdtlov/stripe*
>>> 1
>>> 0
>>> 4194304
>>> 1
>>>
>>>
>>> Thanks for any help,
>>> -Nick
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
More information about the lustre-discuss
mailing list