[Lustre-discuss] Unbalanced load across OST's

Aaron Everett aeverett at ForteDS.com
Thu Mar 19 17:19:02 PDT 2009


Thanks for the reply.

File sizes are all <1GB and most files are <1MB. For a test, I copied a typical result set from a non-lustre mount to my lustre directory. Total size of the test is 42GB. I included before/after results for lfs df -i from a client. 

Before test:
[root at englogin01 backups]# lfs df 
UUID                 1K-blocks      Used Available  Use% Mounted on
fortefs-MDT0000_UUID 1878903960 129326660 1749577300    6% /lustre/work[MDT:0]
fortefs-OST0000_UUID 1264472876 701771484 562701392   55% /lustre/work[OST:0]
fortefs-OST0001_UUID 1264472876 396097912 868374964   31% /lustre/work[OST:1]
fortefs-OST0002_UUID 1264472876 393607384 870865492   31% /lustre/work[OST:2]

filesystem summary:  3793418628 1491476780 2301941848   39% /lustre/work

[root at englogin01 backups]# lfs df -i
UUID                    Inodes     IUsed     IFree IUse% Mounted on
fortefs-MDT0000_UUID 497433511  33195991 464237520    6% /lustre/work[MDT:0]
fortefs-OST0000_UUID  80289792  13585653  66704139   16% /lustre/work[OST:0]
fortefs-OST0001_UUID  80289792   7014185  73275607    8% /lustre/work[OST:1]
fortefs-OST0002_UUID  80289792   7013859  73275933    8% /lustre/work[OST:2]

filesystem summary:  497433511  33195991 464237520    6% /lustre/work


After test:

[aeverett at englogin01 ~]$ lfs df
UUID                 1K-blocks      Used Available  Use% Mounted on
fortefs-MDT0000_UUID 1878903960 129425104 1749478856    6% /lustre/work[MDT:0]
fortefs-OST0000_UUID 1264472876 759191664 505281212   60% /lustre/work[OST:0]
fortefs-OST0001_UUID 1264472876 395929536 868543340   31% /lustre/work[OST:1]
fortefs-OST0002_UUID 1264472876 393392924 871079952   31% /lustre/work[OST:2]

filesystem summary:  3793418628 1548514124 2244904504   40% /lustre/work

[aeverett at englogin01 ~]$ lfs df -i
UUID                    Inodes     IUsed     IFree IUse% Mounted on
fortefs-MDT0000_UUID 497511996  33298931 464213065    6% /lustre/work[MDT:0]
fortefs-OST0000_UUID  80289792  13665028  66624764   17% /lustre/work[OST:0]
fortefs-OST0001_UUID  80289792   7013783  73276009    8% /lustre/work[OST:1]
fortefs-OST0002_UUID  80289792   7013456  73276336    8% /lustre/work[OST:2]

filesystem summary:  497511996  33298931 464213065    6% /lustre/work






-----Original Message-----
From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Brian J. Murrell
Sent: Thursday, March 19, 2009 3:13 PM
To: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] Unbalanced load across OST's

On Thu, 2009-03-19 at 14:33 -0400, Aaron Everett wrote:
> Hello all,

Hi,

> We are running 1.6.6 with a shared mgs/mdt and 3 ost’s. We run a set 
> of tests that write heavily, then we review the results and delete the 
> data. Usually the load is evenly spread across all 3 ost’s. I noticed 
> this afternoon that the load does not seem to be distributed.

Striping as well as file count and size affects OST distribution as well.  Are any of the data involved striped?  Are you writing very few large files before you measure distribution?

> OST0000 has a load of 50+ with iowait of around 10%
> 
> OST0001 has a load of <1 with >99% idle
> 
> OST0002 has a load of <1 with >99% idle

What does lfs df say before and after such a test that produces the above results?  Does it bear out even use amongst the OST before, and after the test?

> df confirms the lopsided writes:

lfs df [-i] from a client is usually more illustrative of use.  As I say above, if you can quiesce the filesystem for the test above, do an lfs df; lfs df -i before the test and after.  Assuming you were successful in quiescing, you should see the change to the OSTs that your test effected.

> OST0000:
> 
> Filesystem            Size  Used Avail Use% Mounted on
> 
> /dev/sdb1             1.2T  602G  544G  53% /mnt/fortefs/ost0

What's important is what it looked like before the test too.  Your test could have, for example, wrote a single object (i.e. file) of nearly 300G for all we can tell from what you've posted so far.

b.




More information about the lustre-discuss mailing list