[Lustre-discuss] Slow read performance across OSSes

Sat Oct 17 17:58:33 PDT 2009

   Many thanks for the reply Andreas.

> You're sure that there isn't some other strange effect here, like you
> are only measuring the speed of a single iozone thread or similar?

  I'm just looking at the output from Bonnie++ running on the client.

  I see corresponding numbers when examining iostat on each OST.  The sum
of all iostats from each OST in use matches the bonnie++ numbers.

  Can Bonnie be at fault ?  I've only been setting test size.  I'll try
iozone to see if it returns similar results.

> This is definitely NOT expected, and I'm puzzled as to why this might
> be.

   Considering how 'stock' this should be, ie RHEL 5.3 with Sun provided
RPMS I must be doing something wrong or more folks would see it but I'm
dipped if I know what it is.  Everything works, no errors, just slow for
multiple OSSes.

> You could check /proc/fs/lustre/obdfilter/*/brw_stats on the
> respective OSTs
> to see if the client is not assembling the RPCs very well for some
> reason.

   I ran two instances of bonnie++, the first used OST0000 and OST0001
on OSS1, the second used OST0001 on OSS1 and OST0002 on OSS2.  I rebooted
between each run to reset the stats.

The contents of  /proc/fs/lustre/obdfilter/lustre-OST0001/brw_stats
look essentially identical in both runs even thought the read rate in the
first was 114MB and in the second 38MB/s.  I've appended the read portion
of both files below.

   Not sure exactly what I should be looking for in those stats.  I'm also
curious how it could be the OST at fault since 2 OSTs on one OSS give
the expected ~115MB/s read rate but 2 OSTs on two OSSes give ~40MB/s.

> Alternately, it might be that you have configured the disk storage of
> OSS-1
> and OSS-2 to compete (e.g. different partitions sharing the same disks).

    Each OSS has two internal PCI 8 port 3ware 9550sx cards and 16 internal
disks carved into two separate 7+1 RAID 5 groups (one per card).  They're
physically distinct where disk storage is concerned.

> No, the client needs to assemble the OST objects itself, regardless of
> whether the OSTs are on the same OSS or not.  The file should be striped
> over all of the OSTs involved in the test.

   Iostat on each OST confirms the striping.  I see and don't see
reads on OSTs where I'd expect as I change the striping.  OST's not in
use are quiescent.  OSTs in use show uniform read rates between them and
they have relative constant rates per second.  No starvation apparent.

   It sure seems like some issue on the client not being able to deal
with multiple streams from multiple OSS but can deal just fine with
multiple streams from a single OSS.

   I've tried to think of some way the switch could be at fault but
haven't come up with anything.  It's a Cisco 2960 gigabit switch and while
it can block it shouldn't be in this case.  I have no problem obtaining
115MB/s read writes as long as I avoid reading across two OSSes.

   Again, many thanks for the reply.  If nothing else knowing it really
is wrong will make me keep digging.  If you can think of any output I
could show or test I could do to help isolate the problem I'm all ears.

James Robnett
NRAO/NM

  Below is the read portion of brw_stats for OST0001 from the 40MB/s
run (left) and 115MB/s run (right), I removed the write portion for
clarity.

                      read (40MB/s)  |     read (115MB/s)
pages per bulk r/w     rpcs  % cum % |    rpcs  % cum %
1:                    5003  17  17   |   5256  18  18
2:                      13   0  17   |     23   0  18
4:                      11   0  17   |      1   0  18
8:                      19   0  17   |      1   0  18
16:                     14   0  17   |     11   0  18
32:                     53   0  17   |     18   0  18
64:                     47   0  17   |     11   0  18
128:                    74   0  17   |     35   0  18
256:                 24145  82 100   |  23415  81 100

                           read      |        read
discontiguous pages    rpcs  % cum % |    rpcs  % cum %
0:                   29261  99  99   |  28735  99  99
1:                      61   0  99   |     34   0  99
2:                      18   0  99   |      2   0 100
3:                      15   0  99   |      0   0 100
4:                       9   0  99   |      0   0 100
5:                       7   0  99   |      0   0 100
6:                       4   0  99   |      0   0 100
7:                       3   0  99   |      0   0 100
8:                       0   0  99   |      0   0 100
9:                       1   0 100   |      0   0 100
10:                      0   0 100   |      0   0 100
11:                      0   0 100   |      0   0 100
12:                      0   0 100   |      0   0 100
13:                      0   0 100   |

                           read      |        read
discontiguous blocks   rpcs  % cum % |   rpcs  % cum %
0:                   29261  99  99   |  28735  99  99
1:                      61   0  99   |     34   0  99
2:                      18   0  99   |      2   0 100
3:                      15   0  99   |      0   0 100
4:                       9   0  99   |      0   0 100
5:                       7   0  99   |      0   0 100
6:                       4   0  99   |      0   0 100
7:                       3   0  99   |      0   0 100
8:                       0   0  99   |      0   0 100
9:                       1   0 100   |      0   0 100
10:                      0   0 100   |      0   0 100
11:                      0   0 100   |      0   0 100
12:                      0   0 100   |      0   0 100
13:                      0   0 100   |

                           read      |        read
disk fragmented I/Os   ios   % cum % |   ios   % cum %
0:                       1   0   0   |   5308  18  18
1:                    5084  17  17   |     12   0  18
2:                      44   0  17   |     18   0  18
3:                      46   0  17   |     17   0  18
4:                      38   0  17   |     10   0  18
5:                      31   0  17   |     20   0  18
6:                      30   0  17   |     12   0  18
7:                      29   0  18   |  23353  81  99
8:                   24034  81  99   |     21   0 100
9:                      27   0  99   |      0   0 100
10:                      8   0  99   |      0   0 100
11:                      3   0  99   |      0   0 100
12:                      3   0  99   |      0   0 100
13:                      0   0  99   |
14:                      1   0 100   |

                           read      |        read
disk I/Os in flight    ios   % cum % |    ios   % cum %
1:                   15990   8   8   |  14821   7   7
2:                   16817   8  16   |  16105   8  16
3:                   15968   8  24   |  14930   7  23
4:                   15761   7  32   |  14260   7  31
5:                   16390   8  40   |  14644   7  38
6:                   17131   8  49   |  15039   7  46
7:                   17786   8  58   |  15383   7  54
8:                   18551   9  67   |  15887   8  62
9:                    7313   3  71   |   7218   3  66
10:                   7100   3  74   |   7006   3  70
11:                   6755   3  78   |   6824   3  73
12:                   6416   3  81   |   6738   3  77
13:                   5931   2  84   |   6438   3  80
14:                   5386   2  87   |   6209   3  83
15:                   4831   2  89   |   5983   3  86
16:                   4287   2  91   |   5540   2  89
17:                   2146   1  92   |   2314   1  90
18:                   1928   0  93   |   2213   1  92
19:                   1703   0  94   |   2046   1  93
20:                   1531   0  95   |   1911   0  94
21:                   1376   0  96   |   1772   0  95
22:                   1202   0  96   |   1602   0  95
23:                   1011   0  97   |   1398   0  96
24:                    749   0  97   |   1190   0  97
25:                    435   0  97   |    640   0  97
26:                    383   0  98   |    584   0  97
27:                    358   0  98   |    526   0  98
28:                    328   0  98   |    477   0  98
29:                    298   0  98   |    434   0  98
30:                    258   0  98   |    365   0  98
31:                   2559   1 100   |   2224   1 100

                           read      |        read
I/O time (1/1000s)     ios   % cum % |    ios   % cum %
1:                    1079   3   3   |    339   1   1
2:                    5565  18  22   |   3228  11  12
4:                    5672  19  41   |   6847  23  36
8:                    2649   9  50   |   4393  15  51
16:                   5967  20  71   |   8461  29  80
32:                   7243  24  95   |   4243  14  95
64:                   1073   3  99   |   1176   4  99
128:                   126   0  99   |     84   0 100
256:                     5   0 100   |      0   0 100
512:                     0   0 100   |      0   0 100

                           read      |        read
disk I/O size          ios   % cum % |    ios   % cum %
4K:                   5147   2   2   |   5263   2   2
8K:                     94   0   2   |     28   0   2
16K:                    18   0   2   |     11   0   2
32K:                    45   0   2   |     20   0   2
64K:                    98   0   2   |     48   0   2
128K:               193276  97 100   | 187351  97 100