Hi list<br><br>After digging deeper on our mds and osts log, i found that maybe i had a problem with <b>last_id and obdid</b>. I checked follow the instruction in the lustre 1.8 manual ( 821-0035 v1.3 - 23.3.9 ) and saw that the last_id matched with the object existing on my OST and that means the problem is the<b> incorrect lov_objid</b> file. <br>
<br>The manual tell me that simply delete the love_objid file and it will be re-created from the last_id of ost. I did have a problem with changing parameter in MDS ( in mounted ldiskfs file system ) and i have to be very careful when playing with it. So, could anyone here help me to confirm that:<b> deleting the lov_objid file is harmless and that file will be re-created and work properly</b> ? <br>
<br>Any help would be highly appreciated !!! <br><br><br><br><div class="gmail_quote">On Wed, Aug 4, 2010 at 3:06 PM, Lex <span dir="ltr"><<a href="mailto:lexluthor87@gmail.com">lexluthor87@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Hi list<br><br>I have a small lustre storage system with 12 OSTs, after using in about 1 year, the free space on each one are as follow:<br>
<br><i>UUID bytes Used Available Use% Mounted on<br>
lustre-MDT0000_UUID 189.4G 9.8G 168.8G 5% /mnt/lustre[MDT:0]<br>lustre-OST0001_UUID 6.3T 4.6T 1.3T 73% /mnt/lustre[OST:1]<br>lustre-OST0003_UUID 4.0T 3.8T 22.0M 94% /mnt/lustre[OST:3]<br>
lustre-OST0004_UUID 5.4T 4.9T 163.2G 91% /mnt/lustre[OST:4]<br>lustre-OST0005_UUID 5.4T 4.7T 423.6G 87% /mnt/lustre[OST:5]<br>lustre-OST0006_UUID 4.0T 3.8T 356.3M 94% /mnt/lustre[OST:6]<br>
lustre-OST0008_UUID 5.4T 5.0T 99.2G 93% /mnt/lustre[OST:8]<br>lustre-OST0009_UUID 5.4T 5.0T 124.2G 92% /mnt/lustre[OST:9]<br>lustre-OST000a_UUID 5.4T 4.6T 540.9G 85% /mnt/lustre[OST:10]<br>
lustre-OST000b_UUID 5.4T 4.5T 557.9G 84% /mnt/lustre[OST:11]<br>lustre-OST000c_UUID 6.7T 1.6T 4.7T 24% /mnt/lustre[OST:12]<br>
lustre-OST000d_UUID 6.7T 478.3G 5.9T 6% /mnt/lustre[OST:13]<br></i><br>As you see, there is an unbalance in free spaces of our OSTs. I tried to overcome it by setting a pool like this :<br>
<br><i>root@MDS1: ~ # <b>lctl pool_list lustre.para</b><br>Pool: lustre.para<br>lustre-OST0004_UUID<br>lustre-OST0005_UUID<br>lustre-OST000a_UUID<br>lustre-OST000b_UUID<br>lustre-OST0001_UUID</i><br><br>We controlled the write action in our directories manually by adding or removing the member of pool ( based on their free space ). Everything worked quite well for about 2 months. But when one of our OST ran out of free space, there was many things like this appeared in MDS messages log: <br>
<br><i> MDS1 kernel: Lustre: 12012:0:(lov_qos.c:460:qos_shrink_lsm()) using fewer stripes for object 28452745: old 5 new 4<br> MDS1 kernel: Lustre: 12012:0:(lov_qos.c:460:qos_shrink_lsm()) Skipped 4 previous similar messages<br>
MDS1 kernel: Lustre: 12032:0:(lov_qos.c:460:qos_shrink_lsm()) using fewer stripes for object 28453405: old 5 new 4<br> MDS1 kernel: Lustre: 12032:0:(lov_qos.c:460:qos_shrink_lsm()) Skipped 39 previous similar messages</i><br>
<br>And the problems now is: <br><br>There is nothing to be written on OST1 ( <i>lustre-OST0001_UUID ), </i>its free space is always be 1.3T after many days while the others went do quite fast. <br><br>I also test by making a brand new directory in our storage system and set stripe index to be only 1 like this: <br>
<br><b><i>mkdir /mnt/lustre/HD-OST1/mv</i><br><i>lfs setstripe -c 1 -i 1 /mnt/lustre/HD-OST1/mv</i></b><br><br>and touch one file :<b> </b><i><b>touch test</b><br></i>and the result is: <br><br><b> </b><i><b>lfs getstripe /mnt/lustre/HD-OST1/mv/test</b><br>
OBDS:<br>1: lustre-OST0001_UUID ACTIVE<br>3: lustre-OST0003_UUID ACTIVE<br>4: lustre-OST0004_UUID ACTIVE<br>5: lustre-OST0005_UUID ACTIVE<br>6: lustre-OST0006_UUID ACTIVE<br>8: lustre-OST0008_UUID ACTIVE<br>9: lustre-OST0009_UUID ACTIVE<br>
10: lustre-OST000a_UUID ACTIVE<br>11: lustre-OST000b_UUID ACTIVE<br>12: lustre-OST000c_UUID ACTIVE<br>13: lustre-OST000d_UUID ACTIVE<br>/mnt/lustre/HD-OST1/mv/test<br> <b>obdidx </b> objid objid group<br>
<b> 4</b> 6759925 0x6725f5 0</i><br><br>It obdidx was 4 !!! I also tried to change the index to another value ( 3 - 13 in our ost list ), and it showed the correct value of obdidx. It only went wrong with my OST1!!!!<br>
<br><br> Could you please explain it for me or show me what's wrong with my command ? <br><br>Many thanks <br>
</blockquote></div><br>