[Lustre-discuss] Performance versus allocation balance(?)

Frank Heckes f.heckes at fz-juelich.de
Mon Jun 27 02:25:33 PDT 2011

Hi all,

(many thanks to J. Lombardi directing our intention to this feature)
we noticed that a filesytem based on

  2 x  (4 x OSS + 1 SFA 10000)
  SLES11 + Lustre 1.8.4

slowed down drastically from 19.2 GB/s write throughput to 2.9 GB/s.
Of course the environment wasn't changed. ;-) Measurements were taken on
a nearly empty file system (19.2GB/s) and the current state 70% full
(2.9GB/s). All measurements were performed exclusively during system
maintenance. I.e. no other application using the cluster nor the storage
devices were active.

When trying to find a solution to reach the 'old' value again a test
changing the parameter 'qos_threshold_rr' to 100% leads to the desired

Upon checking the performance counters on the OS and the SFA side we
noticed that for the default setting for qos_threshold (16%) the
'distribution' of the objects was not equal. A large number of OST were
completely inactive. We found as expected an equal load distribution for

Does anyone can confirm this observation?

Is this really a 'feature' of Lustre and if so are there plans to
're-design' the object allocation part, so that

-a- full bandwidth can be reached with help of RR
-b- Allocation balancing is done with help of a 'background' thread
    shifting objects accordingly to the striping policy of the file(s)
    some time later to OSTs with lower object allocation?

Many thanks in advance


-Frank Heckes

Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt

Besuchen Sie uns auf unserem neuen Webauftritt unter www.fz-juelich.de

More information about the lustre-discuss mailing list