[Lustre-discuss] Re-balance the un-balanced OSTs

Andreas Dilger adilger at sun.com
Fri Dec 4 21:35:50 PST 2009


On 2009-12-04, at 00:55, thhsieh wrote:
> We are running a lustre-1.6.6 on Linux vanilla kernel 2.6.22.19.
> Recently we suffer a serious un-balanced OSTs problem. If I run
> the command:
>
> # /opt/lustre-1.8/bin/lfs df
> UUID                 1K-blocks      Used Available  Use% Mounted on
> cwarp-MDT0000_UUID   119627860    659184 112132220    0% /mnt/ 
> src[MDT:0]
> cwarp-OST0000_UUID   1441859112 1343925184  24691744   93% /mnt/ 
> src[OST:0]
> cwarp-OST0001_UUID   1441859128 799739136 568869616   55% /mnt/ 
> src[OST:1]
> cwarp-OST0002_UUID   1441859128 643666316 724950624   44% /mnt/ 
> src[OST:2]
> cwarp-OST0003_UUID   1441859112 745288308 623015556   51% /mnt/ 
> src[OST:3]
> cwarp-OST0004_UUID   1441859128 654020352 714567920   45% /mnt/ 
> src[OST:4]
> cwarp-OST0005_UUID   1441859128 658416232 709949996   45% /mnt/ 
> src[OST:5]
>
> filesystem summary:  8651154736 4845055528 3366045456   56% /mnt/src
>
> It is clear that the cwarp-OST0000_UUID is almost full, but the
> other OSTs are still quite empty.

Normally this happens when a single large file like a tarball.  You  
can use "lfs find" to find the few largest files on the full OST, and  
either stripe them over all the OSTs or just distribute them  
individually over the OSTs.

> No matter what, we are trying
> to do something in order to re-balance the OSTs. Our procedure is
> (which is indicated in the Lustre manual):
>
> 1. In MDS, disable the cwarp-OST0000 so that newly created file
>   will go to other OSTs:
>
>   echo 0 > /proc/fs/lustre/osc/cwarp-OST0000-osc/active

Depending on how much "turnover" you have of files (deletion of old  
files, creating new files) this will slowly empty OST0000 and fill the  
other OSTs.

> 2. In one of the client node, we copy and rename files, in hope that
>   some files can be pull out of the cwarp-OST0000 and then go to other
>   OSTs:
>
>   cp /path/to/some/files /path/to/some/files.tmp
>   mv /path/to/some/files.tmp /path/to/some/files
>
> However, this way seems does not help too much. I guess this is  
> because
> we do not pull out the files which locate in cwarp-OST0000 exactly.

You can find files that have stripes that OST with:

lfs find -obd cwarp-OST0000_UUID /path/to/lustre

You can further limit this to large files with

lfs find -obd cwarp-OST0000_UUID -size +1G /path/to/lustre

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list