Jure Pečar
Wed May 8 06:12:52 PDT 2013


I have a lustre 2.2 environment which looks like this:

# lfs df -h
UUID                       bytes        Used   Available Use% Mounted on
lustre22-MDT0000_UUID      95.0G        9.4G       79.3G  11% /lustre[MDT:0]
lustre22-OST0000_UUID       5.5T        2.1T        3.3T  39% /lustre[OST:0]
lustre22-OST0001_UUID       5.5T        1.2T        4.3T  22% /lustre[OST:1]
lustre22-OST0002_UUID       5.5T     1016.0G        4.5T  18% /lustre[OST:2]
lustre22-OST0003_UUID       5.5T      948.3G        4.5T  17% /lustre[OST:3]
lustre22-OST0004_UUID       5.5T      812.3G        4.7T  15% /lustre[OST:4]
lustre22-OST0005_UUID       5.5T      641.4G        4.8T  11% /lustre[OST:5]
lustre22-OST0006_UUID       5.5T      619.4G        4.8T  11% /lustre[OST:6]
lustre22-OST0007_UUID       5.5T      587.0G        4.9T  11% /lustre[OST:7]
lustre22-OST0008_UUID       5.5T      539.7G        4.9T  10% /lustre[OST:8]
OST0009             : inactive device
lustre22-OST000a_UUID       5.5T      531.3G        4.9T  10% /lustre[OST:10]
lustre22-OST000b_UUID       5.5T      488.9G        5.0T   9% /lustre[OST:11]
lustre22-OST000c_UUID       5.5T      451.2G        5.0T   8% /lustre[OST:12]
lustre22-OST000d_UUID       5.5T      450.1G        5.0T   8% /lustre[OST:13]
lustre22-OST000e_UUID       5.5T      448.8G        5.0T   8% /lustre[OST:14]
lustre22-OST000f_UUID       5.5T      444.0G        5.0T   8% /lustre[OST:15]
lustre22-OST0010_UUID       5.5T      422.5G        5.0T   8% /lustre[OST:16]
lustre22-OST0011_UUID       5.5T      414.5G        5.0T   7% /lustre[OST:17]
lustre22-OST0012_UUID       5.5T      406.9G        5.1T   7% /lustre[OST:18]
OST0013             : inactive device

Reading through documentation I see that lustre should prefer those OSTs with most free disk space (qos_prio_free is set to 91%). However my monitoring tells me that OST0000 is the most loaded by far, having loadavg over 300 and network traffic 3-5x higher than the rest.

I raised qos_threshold_rr to 55% and am waiting to see the results. Right now I have clients reading and writing to this fs at around 600MB/s aggregated, generating hundreds of files per job.

How soon am I expected to see the results?

What else can I do to spread the load from OST0000 evenly among the other OSTs?


Jure Pečar

