[Lustre-discuss] Re-balance the un-balanced OSTs
Andreas Dilger
adilger at sun.com
Fri Dec 4 21:35:50 PST 2009
On 2009-12-04, at 00:55, thhsieh wrote:
> We are running a lustre-1.6.6 on Linux vanilla kernel 2.6.22.19.
> Recently we suffer a serious un-balanced OSTs problem. If I run
> the command:
>
> # /opt/lustre-1.8/bin/lfs df
> UUID 1K-blocks Used Available Use% Mounted on
> cwarp-MDT0000_UUID 119627860 659184 112132220 0% /mnt/
> src[MDT:0]
> cwarp-OST0000_UUID 1441859112 1343925184 24691744 93% /mnt/
> src[OST:0]
> cwarp-OST0001_UUID 1441859128 799739136 568869616 55% /mnt/
> src[OST:1]
> cwarp-OST0002_UUID 1441859128 643666316 724950624 44% /mnt/
> src[OST:2]
> cwarp-OST0003_UUID 1441859112 745288308 623015556 51% /mnt/
> src[OST:3]
> cwarp-OST0004_UUID 1441859128 654020352 714567920 45% /mnt/
> src[OST:4]
> cwarp-OST0005_UUID 1441859128 658416232 709949996 45% /mnt/
> src[OST:5]
>
> filesystem summary: 8651154736 4845055528 3366045456 56% /mnt/src
>
> It is clear that the cwarp-OST0000_UUID is almost full, but the
> other OSTs are still quite empty.
Normally this happens when a single large file like a tarball. You
can use "lfs find" to find the few largest files on the full OST, and
either stripe them over all the OSTs or just distribute them
individually over the OSTs.
> No matter what, we are trying
> to do something in order to re-balance the OSTs. Our procedure is
> (which is indicated in the Lustre manual):
>
> 1. In MDS, disable the cwarp-OST0000 so that newly created file
> will go to other OSTs:
>
> echo 0 > /proc/fs/lustre/osc/cwarp-OST0000-osc/active
Depending on how much "turnover" you have of files (deletion of old
files, creating new files) this will slowly empty OST0000 and fill the
other OSTs.
> 2. In one of the client node, we copy and rename files, in hope that
> some files can be pull out of the cwarp-OST0000 and then go to other
> OSTs:
>
> cp /path/to/some/files /path/to/some/files.tmp
> mv /path/to/some/files.tmp /path/to/some/files
>
> However, this way seems does not help too much. I guess this is
> because
> we do not pull out the files which locate in cwarp-OST0000 exactly.
You can find files that have stripes that OST with:
lfs find -obd cwarp-OST0000_UUID /path/to/lustre
You can further limit this to large files with
lfs find -obd cwarp-OST0000_UUID -size +1G /path/to/lustre
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
More information about the lustre-discuss
mailing list