[Lustre-discuss] Bad distribution of files among OSTs

Peter Grandi pg_lus at lus.for.sabi.co.UK
Sat Nov 7 12:48:59 PST 2009


[ ... ]

> Could this situation, 10 full OSTs out of 200, lead to a
> significant drop in performance?

Likely so, the major reasons being:

* If the OST spans a significant percentage of some disks, the
  inner tracks of disks are significantly slower than the outer
  tracks. This applies to any filesystem that fills up a disk.
  My home PC 1TB disk can do about 100-110MB/s throught the
  (JFS) filesystem in the outer tracks and aroun 50-55MB/s on
  the inner ones.

* The "free list" can become significantly scattered, depending
  on the precise allocation patterns of disks. If there are many
  rewrites of small files that can be particularly bad. Even
  extent base filesystems, which suffer particularly badly from
  that as the same file size has to be split into many more
  extents, increasing metadata overhead.

The two above are likely the reason why there have been other
reports that speed goes down as filesystems fill up:

 https://www.rz.uni-karlsruhe.de/rz/docs/Lustre/ssck_sfs_isc2007

   «Performance degradation on xc2
          After 6 months of production we lost half of the file
          system performance
              Problem is under investigation by HP
              We had a similar problem on xc1 which was due to
              fragmentation Current solution for defragmentation
              is to recreate file systems»

> Before, we could usually get the full 110MB/s or so over the
> 1Gbit/s ethernet lines of the clients.  That had dropped to
> about 50%, but we did not find any other odd thing than the
> filling levels of the OSTs.

It could just be that *all* the OSTs are filling up; it is
impossible to avoid the inner track issue on hard disks (except
by limiting the top performance), and very difficult to avoid
the scattering of the "free list".

If you really care some solutions are:

* Keep filesystem not more than 60-70% full.

* Periodically reload filesystems from backup after reformatting.

* Use just the outer 1/3 to 1/2 of the disks (which in recent
  years been called "short stroking").

But looking at the absolute numbers there is something really
wrong: 50MB/s out of 200 OSTs is ridiculously low. The problem
is not that it is half of 110MB/s, and lower than it was then,
but that it is very low.

Each OST should be delivering at least 50MB/s if with recent
drives, and even with mild issues of inner track/fragmentation
of the "free list".

That you are getting 50MB/s may indicate that somehow your files
are not being sliced across multiple OSTs. This can have several
different reasons; IIRC there are a few discussions in the list
archive on this.




More information about the lustre-discuss mailing list