[Lustre-discuss] Using brw_stats to diagnose lustre performance
mark
mark at msi.umn.edu
Mon Jun 14 14:20:36 PDT 2010
Hi Everyone,
I'm trying to diagnose some performance concerns we are having about our
lustre deployment. It seems to be a fairly multifaceted problem
involving how ifort does buffered writes along with how we have lustre
setup.
What I've identified so far is that our raid stripe size on the OSTs is
768KB (6 * 128KB chunks) and the partitions are not being mounted with
-o strip. We have 2 luns per controller and each virtual disk has 2
partitions with the 2nd one being the lustre file system. It is
possible the partitions are not aligned. Most of the client side
settings are at default (ie 8 rpcs in flight, 32MB dirty cache per OST,
etc). The journals are on separate SSDs. Our OSSes are probably
oversubscribed.
What we've noticed is that with certain apps we get *really* bad
performance to the OSTs. As bad as 500-800KB/s to one OST. The best
performance I've seen to an OST is around 300MB/s, with 500MB/s being
more or less the upper bound limited by IB.
Right now I'm trying to verify that fragmentation is happening like I
would expect given the configuration mentioned above. I just learned
about brw_stats, so I tried examining it for one of our OSTs (It looks
like lustre must have been restarted recently with so little data):
disk fragmented I/Os ios % cum % | ios % cum %
1: 0 0 0 | 215 9 9
2: 0 0 0 | 2004 89 98
3: 0 0 0 | 22 0 99
4: 0 0 0 | 2 0 99
5: 0 0 0 | 5 0 99
6: 0 0 0 | 2 0 99
7: 1 100 100 | 1 0 100
disk I/O size ios % cum % | ios % cum %
4K: 3 42 42 | 17 0 0
8K: 0 0 42 | 17 0 0
16K: 0 0 42 | 22 0 1
32K: 0 0 42 | 73 1 2
64K: 1 14 57 | 292 6 9
128K: 0 0 57 | 385 8 18
256K: 3 42 100 | 88 2 20
512K: 0 0 100 | 1229 28 48
1M: 0 0 100 | 2218 51 100
My questions are:
1) Does a disk framentation of "1" mean that those IO was fragmented or
would that be "0"?
2) Does the disk I/O size mean what lustre actually wrote or what it
wanted to write? What does that number mean in the context of our 768KB
stripe size since it lists so many I/Os at 1M?
Thanks,
Mark
More information about the lustre-discuss
mailing list