[Lustre-discuss] OST load distribution

Wed May 8 15:21:46 PDT 2013

On 2013-05-08, at 7:14, "Jure Pečar" <pegasus at nerv.eu.org> wrote:
> I have a lustre 2.2 environment which looks like this:
> 
> # lfs df -h
> UUID                       bytes        Used   Available Use% Mounted on
> lustre22-MDT0000_UUID      95.0G        9.4G       79.3G  11% /lustre[MDT:0]
> lustre22-OST0000_UUID       5.5T        2.1T        3.3T  39% /lustre[OST:0]
> lustre22-OST0001_UUID       5.5T        1.2T        4.3T  22% /lustre[OST:1]
> lustre22-OST0002_UUID       5.5T     1016.0G        4.5T  18% /lustre[OST:2]
> lustre22-OST0003_UUID       5.5T      948.3G        4.5T  17% /lustre[OST:3]
[snip more OSTs with same usage]
> 
> What else can I do to spread the load from OST0000 evenly among the other OSTs?

Once you have found the source of the problem, then it may be best to do nothing if you have a high file turnover rate.  Lustre will eventually balance itself out.

You can proactively find large files on this OST and migrate them to other OSTs.  This will make copies of these files, and will also put a high load on OST0000.

 Note this is only currently safe if you "know" the migrated files are not in use, or at opened read-only. That depends on your workload and users (e.g. users not logged in or running jobs, older files, etc).

client# lfs find /lustre -ost lustre22-OST0000 -mtime +10 -size +1G  > ost0000-list.txt
{edit ost0000-list.txt to only contain known inactive files}
client# lfs_migrate < ost0000-list.txt

In Lustre 2.4 it will be possible to migrate files that are in use, since it will preserve the inode numbers.

If you can't find the source of the problem, and OST0000 is getting very full, you could mark the OST inactive on the MDS node:

mds# lctl --device %lustre22-OST0000 deactivate

And no new objects will be allocated on the OST after that time. 

Cheers, Andreas