[Lustre-discuss] Bad distribution of files among OSTs
Peter Grandi
pg_lus at lus.for.sabi.co.UK
Sat Nov 7 12:48:59 PST 2009
[ ... ]
> Could this situation, 10 full OSTs out of 200, lead to a
> significant drop in performance?
Likely so, the major reasons being:
* If the OST spans a significant percentage of some disks, the
inner tracks of disks are significantly slower than the outer
tracks. This applies to any filesystem that fills up a disk.
My home PC 1TB disk can do about 100-110MB/s throught the
(JFS) filesystem in the outer tracks and aroun 50-55MB/s on
the inner ones.
* The "free list" can become significantly scattered, depending
on the precise allocation patterns of disks. If there are many
rewrites of small files that can be particularly bad. Even
extent base filesystems, which suffer particularly badly from
that as the same file size has to be split into many more
extents, increasing metadata overhead.
The two above are likely the reason why there have been other
reports that speed goes down as filesystems fill up:
https://www.rz.uni-karlsruhe.de/rz/docs/Lustre/ssck_sfs_isc2007
«Performance degradation on xc2
After 6 months of production we lost half of the file
system performance
Problem is under investigation by HP
We had a similar problem on xc1 which was due to
fragmentation Current solution for defragmentation
is to recreate file systems»
> Before, we could usually get the full 110MB/s or so over the
> 1Gbit/s ethernet lines of the clients. That had dropped to
> about 50%, but we did not find any other odd thing than the
> filling levels of the OSTs.
It could just be that *all* the OSTs are filling up; it is
impossible to avoid the inner track issue on hard disks (except
by limiting the top performance), and very difficult to avoid
the scattering of the "free list".
If you really care some solutions are:
* Keep filesystem not more than 60-70% full.
* Periodically reload filesystems from backup after reformatting.
* Use just the outer 1/3 to 1/2 of the disks (which in recent
years been called "short stroking").
But looking at the absolute numbers there is something really
wrong: 50MB/s out of 200 OSTs is ridiculously low. The problem
is not that it is half of 110MB/s, and lower than it was then,
but that it is very low.
Each OST should be delivering at least 50MB/s if with recent
drives, and even with mild issues of inner track/fragmentation
of the "free list".
That you are getting 50MB/s may indicate that somehow your files
are not being sliced across multiple OSTs. This can have several
different reasons; IIRC there are a few discussions in the list
archive on this.
More information about the lustre-discuss
mailing list