[Lustre-discuss] Problem with striping on lustre 1.8
Lex
lexluthor87 at gmail.com
Sun Aug 8 21:43:58 PDT 2010
Hi list
After digging deeper on our mds and osts log, i found that maybe i had a
problem with *last_id and obdid*. I checked follow the instruction in the
lustre 1.8 manual ( 821-0035 v1.3 - 23.3.9 ) and saw that the last_id
matched with the object existing on my OST and that means the problem is the
* incorrect lov_objid* file.
The manual tell me that simply delete the love_objid file and it will be
re-created from the last_id of ost. I did have a problem with changing
parameter in MDS ( in mounted ldiskfs file system ) and i have to be very
careful when playing with it. So, could anyone here help me to confirm that:
* deleting the lov_objid file is harmless and that file will be re-created
and work properly* ?
Any help would be highly appreciated !!!
On Wed, Aug 4, 2010 at 3:06 PM, Lex <lexluthor87 at gmail.com> wrote:
> Hi list
>
> I have a small lustre storage system with 12 OSTs, after using in about 1
> year, the free space on each one are as follow:
>
> *UUID bytes Used
> Available Use% Mounted on
> lustre-MDT0000_UUID 189.4G 9.8G 168.8G 5%
> /mnt/lustre[MDT:0]
> lustre-OST0001_UUID 6.3T 4.6T 1.3T
> 73% /mnt/lustre[OST:1]
> lustre-OST0003_UUID 4.0T 3.8T 22.0M
> 94% /mnt/lustre[OST:3]
> lustre-OST0004_UUID 5.4T 4.9T 163.2G
> 91% /mnt/lustre[OST:4]
> lustre-OST0005_UUID 5.4T 4.7T 423.6G
> 87% /mnt/lustre[OST:5]
> lustre-OST0006_UUID 4.0T 3.8T 356.3M
> 94% /mnt/lustre[OST:6]
> lustre-OST0008_UUID 5.4T 5.0T 99.2G
> 93% /mnt/lustre[OST:8]
> lustre-OST0009_UUID 5.4T 5.0T 124.2G
> 92% /mnt/lustre[OST:9]
> lustre-OST000a_UUID 5.4T 4.6T 540.9G
> 85% /mnt/lustre[OST:10]
> lustre-OST000b_UUID 5.4T 4.5T 557.9G
> 84% /mnt/lustre[OST:11]
> lustre-OST000c_UUID 6.7T 1.6T 4.7T
> 24% /mnt/lustre[OST:12]
> lustre-OST000d_UUID 6.7T 478.3G 5.9T
> 6% /mnt/lustre[OST:13]
> *
> As you see, there is an unbalance in free spaces of our OSTs. I tried to
> overcome it by setting a pool like this :
>
> *root at MDS1: ~ # lctl pool_list lustre.para
> Pool: lustre.para
> lustre-OST0004_UUID
> lustre-OST0005_UUID
> lustre-OST000a_UUID
> lustre-OST000b_UUID
> lustre-OST0001_UUID*
>
> We controlled the write action in our directories manually by adding or
> removing the member of pool ( based on their free space ). Everything worked
> quite well for about 2 months. But when one of our OST ran out of free
> space, there was many things like this appeared in MDS messages log:
>
> * MDS1 kernel: Lustre: 12012:0:(lov_qos.c:460:qos_shrink_lsm()) using
> fewer stripes for object 28452745: old 5 new 4
> MDS1 kernel: Lustre: 12012:0:(lov_qos.c:460:qos_shrink_lsm()) Skipped 4
> previous similar messages
> MDS1 kernel: Lustre: 12032:0:(lov_qos.c:460:qos_shrink_lsm()) using fewer
> stripes for object 28453405: old 5 new 4
> MDS1 kernel: Lustre: 12032:0:(lov_qos.c:460:qos_shrink_lsm()) Skipped 39
> previous similar messages*
>
> And the problems now is:
>
> There is nothing to be written on OST1 ( *lustre-OST0001_UUID ), *its free
> space is always be 1.3T after many days while the others went do quite fast.
>
>
> I also test by making a brand new directory in our storage system and set
> stripe index to be only 1 like this:
>
> *mkdir /mnt/lustre/HD-OST1/mv
> lfs setstripe -c 1 -i 1 /mnt/lustre/HD-OST1/mv*
>
> and touch one file :* **touch test
> *and the result is:
>
> * **lfs getstripe /mnt/lustre/HD-OST1/mv/test
> OBDS:
> 1: lustre-OST0001_UUID ACTIVE
> 3: lustre-OST0003_UUID ACTIVE
> 4: lustre-OST0004_UUID ACTIVE
> 5: lustre-OST0005_UUID ACTIVE
> 6: lustre-OST0006_UUID ACTIVE
> 8: lustre-OST0008_UUID ACTIVE
> 9: lustre-OST0009_UUID ACTIVE
> 10: lustre-OST000a_UUID ACTIVE
> 11: lustre-OST000b_UUID ACTIVE
> 12: lustre-OST000c_UUID ACTIVE
> 13: lustre-OST000d_UUID ACTIVE
> /mnt/lustre/HD-OST1/mv/test
> obdidx objid objid group
> 4 6759925 0x6725f5 0*
>
> It obdidx was 4 !!! I also tried to change the index to another value ( 3 -
> 13 in our ost list ), and it showed the correct value of obdidx. It only
> went wrong with my OST1!!!!
>
>
> Could you please explain it for me or show me what's wrong with my command
> ?
>
> Many thanks
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100809/2f8471f0/attachment.htm>
More information about the lustre-discuss
mailing list