[Lustre-discuss] OST load distribution
Lee, Brett
brett.lee at intel.com
Wed May 8 07:05:18 PDT 2013
> -----Original Message-----
> From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-
> bounces at lists.lustre.org] On Behalf Of Jure Pecar
> Sent: Wednesday, May 08, 2013 6:13 AM
> To: lustre-discuss at lists.lustre.org
> Subject: [Lustre-discuss] OST load distribution
>
>
> Hello,
>
> I have a lustre 2.2 environment which looks like this:
>
> # lfs df -h
> UUID bytes Used Available Use% Mounted on
> lustre22-MDT0000_UUID 95.0G 9.4G 79.3G 11% /lustre[MDT:0]
> lustre22-OST0000_UUID 5.5T 2.1T 3.3T 39% /lustre[OST:0]
> lustre22-OST0001_UUID 5.5T 1.2T 4.3T 22% /lustre[OST:1]
> lustre22-OST0002_UUID 5.5T 1016.0G 4.5T 18% /lustre[OST:2]
> lustre22-OST0003_UUID 5.5T 948.3G 4.5T 17% /lustre[OST:3]
> lustre22-OST0004_UUID 5.5T 812.3G 4.7T 15% /lustre[OST:4]
> lustre22-OST0005_UUID 5.5T 641.4G 4.8T 11% /lustre[OST:5]
> lustre22-OST0006_UUID 5.5T 619.4G 4.8T 11% /lustre[OST:6]
> lustre22-OST0007_UUID 5.5T 587.0G 4.9T 11% /lustre[OST:7]
> lustre22-OST0008_UUID 5.5T 539.7G 4.9T 10% /lustre[OST:8]
> OST0009 : inactive device
> lustre22-OST000a_UUID 5.5T 531.3G 4.9T 10% /lustre[OST:10]
> lustre22-OST000b_UUID 5.5T 488.9G 5.0T 9% /lustre[OST:11]
> lustre22-OST000c_UUID 5.5T 451.2G 5.0T 8% /lustre[OST:12]
> lustre22-OST000d_UUID 5.5T 450.1G 5.0T 8% /lustre[OST:13]
> lustre22-OST000e_UUID 5.5T 448.8G 5.0T 8% /lustre[OST:14]
> lustre22-OST000f_UUID 5.5T 444.0G 5.0T 8% /lustre[OST:15]
> lustre22-OST0010_UUID 5.5T 422.5G 5.0T 8% /lustre[OST:16]
> lustre22-OST0011_UUID 5.5T 414.5G 5.0T 7% /lustre[OST:17]
> lustre22-OST0012_UUID 5.5T 406.9G 5.1T 7% /lustre[OST:18]
> OST0013 : inactive device
>
> Reading through documentation I see that lustre should prefer those OSTs
> with most free disk space (qos_prio_free is set to 91%). However my
> monitoring tells me that OST0000 is the most loaded by far, having loadavg
> over 300 and network traffic 3-5x higher than the rest.
Hi Jure,
The qos_prio_free setting applies after the QOS algorithm is selected.
>
> I raised qos_threshold_rr to 55% and am waiting to see the results. Right now
> I have clients reading and writing to this fs at around 600MB/s aggregated,
> generating hundreds of files per job.
The qos_threshold_rr setting dictates whether the RR or QOS algorithms are used. Setting it to 55% tells the MDS to use QOS only when the difference in OST utilization is greater than 55. You probably should go back to the default of 17% to keep OSTs balanced, unless there is a reason to trade off less equally distributed data for performance.
>
> How soon am I expected to see the results?
>
> What else can I do to spread the load from OST0000 evenly among the other
> OSTs?
>
>
> --
>
> Jure Pečar
> http://jure.pecar.org
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
Best,
--
Brett Lee
Sr. Systems Engineer
Intel High Performance Data Division
More information about the lustre-discuss
mailing list