[Lustre-discuss] Problem with striping on lustre 1.8
Lex
lexluthor87 at gmail.com
Wed Aug 4 01:06:18 PDT 2010
Hi list
I have a small lustre storage system with 12 OSTs, after using in about 1
year, the free space on each one are as follow:
*UUID bytes Used
Available Use% Mounted on
lustre-MDT0000_UUID 189.4G 9.8G 168.8G 5%
/mnt/lustre[MDT:0]
lustre-OST0001_UUID 6.3T 4.6T 1.3T
73% /mnt/lustre[OST:1]
lustre-OST0003_UUID 4.0T 3.8T 22.0M
94% /mnt/lustre[OST:3]
lustre-OST0004_UUID 5.4T 4.9T 163.2G 91%
/mnt/lustre[OST:4]
lustre-OST0005_UUID 5.4T 4.7T 423.6G 87%
/mnt/lustre[OST:5]
lustre-OST0006_UUID 4.0T 3.8T 356.3M
94% /mnt/lustre[OST:6]
lustre-OST0008_UUID 5.4T 5.0T 99.2G
93% /mnt/lustre[OST:8]
lustre-OST0009_UUID 5.4T 5.0T 124.2G
92% /mnt/lustre[OST:9]
lustre-OST000a_UUID 5.4T 4.6T 540.9G
85% /mnt/lustre[OST:10]
lustre-OST000b_UUID 5.4T 4.5T 557.9G
84% /mnt/lustre[OST:11]
lustre-OST000c_UUID 6.7T 1.6T 4.7T
24% /mnt/lustre[OST:12]
lustre-OST000d_UUID 6.7T 478.3G 5.9T
6% /mnt/lustre[OST:13]
*
As you see, there is an unbalance in free spaces of our OSTs. I tried to
overcome it by setting a pool like this :
*root at MDS1: ~ # lctl pool_list lustre.para
Pool: lustre.para
lustre-OST0004_UUID
lustre-OST0005_UUID
lustre-OST000a_UUID
lustre-OST000b_UUID
lustre-OST0001_UUID*
We controlled the write action in our directories manually by adding or
removing the member of pool ( based on their free space ). Everything worked
quite well for about 2 months. But when one of our OST ran out of free
space, there was many things like this appeared in MDS messages log:
* MDS1 kernel: Lustre: 12012:0:(lov_qos.c:460:qos_shrink_lsm()) using fewer
stripes for object 28452745: old 5 new 4
MDS1 kernel: Lustre: 12012:0:(lov_qos.c:460:qos_shrink_lsm()) Skipped 4
previous similar messages
MDS1 kernel: Lustre: 12032:0:(lov_qos.c:460:qos_shrink_lsm()) using fewer
stripes for object 28453405: old 5 new 4
MDS1 kernel: Lustre: 12032:0:(lov_qos.c:460:qos_shrink_lsm()) Skipped 39
previous similar messages*
And the problems now is:
There is nothing to be written on OST1 ( *lustre-OST0001_UUID ), *its free
space is always be 1.3T after many days while the others went do quite fast.
I also test by making a brand new directory in our storage system and set
stripe index to be only 1 like this:
*mkdir /mnt/lustre/HD-OST1/mv
lfs setstripe -c 1 -i 1 /mnt/lustre/HD-OST1/mv*
and touch one file :* **touch test
*and the result is:
* **lfs getstripe /mnt/lustre/HD-OST1/mv/test
OBDS:
1: lustre-OST0001_UUID ACTIVE
3: lustre-OST0003_UUID ACTIVE
4: lustre-OST0004_UUID ACTIVE
5: lustre-OST0005_UUID ACTIVE
6: lustre-OST0006_UUID ACTIVE
8: lustre-OST0008_UUID ACTIVE
9: lustre-OST0009_UUID ACTIVE
10: lustre-OST000a_UUID ACTIVE
11: lustre-OST000b_UUID ACTIVE
12: lustre-OST000c_UUID ACTIVE
13: lustre-OST000d_UUID ACTIVE
/mnt/lustre/HD-OST1/mv/test
obdidx objid objid group
4 6759925 0x6725f5 0*
It obdidx was 4 !!! I also tried to change the index to another value ( 3 -
13 in our ost list ), and it showed the correct value of obdidx. It only
went wrong with my OST1!!!!
Could you please explain it for me or show me what's wrong with my command
?
Many thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100804/d6e5fa68/attachment.htm>
More information about the lustre-discuss
mailing list