[Lustre-discuss] Fwd: Lustre performance issue (obdfilter_survey

Cliff White cliffw at whamcloud.com
Wed Jul 6 13:37:43 PDT 2011


The case=network part of obdfilter_survey has really been replaced by
lnet_selftest.
I don't think it's been maintained in awhile.

It would be best to repeat the network-only test with lnet_selftest, this is
likely an issue with
the script.
cliffw

On Wed, Jul 6, 2011 at 1:04 PM, lior amar <liororama at gmail.com> wrote:

> Hi,
>
> I am installing a Lustre system and I wanted to measure the OSS
> performance.
> I used the obdfilter_survey and got very low performance for low
> thread numbers when using the case=network option
>
>
> System Configuration:
> * Lustre 1.8.6-wc (compiled from the whamcloud git)
> * Centos 5.6
> * Infiniband (mellanox cards) open ib from centos 5.6
> * OSS - 2 quad core  E5620 CPUS
> * OSS - memory 48GB
> * LSI 2965 raid card with 18 disks in raid 6 (16 data + 2). Raw
> performance are good both  when testing the block device or over a file
> system with Bonnie++
>
> * OSS uses ext4 and mkfs parameters were set to reflect the stripe
> size .. -E stride =...
>
> The performance test I did:
>
>
> 1) obdfilter_survey case=disk -
>    OSS performance is ok (similar to raw disk performance) -
>    In the case of 1  thread and one object getting 966MB/sec
>
> 2) obdfilter_survey case=network -
>     OSS performance is bad for low thread numbers and get better as
> the  number of  threads increases.
> For the 1 thread one object getting 88MB/sec
>
> 3) obdfilter_survey case=netdisk -- Same as network case
>
> 4) When running ost_survey I am getting also low performance:
>    Read = 156 MB/sec Write = ~350MB/sec
>
> 5) Running the lnet_self test I get much higher numbers
>  Numbers obtained with concurrency = 1
>
>  [LNet Rates of servers]
>  [R] Avg: 3556     RPC/s Min: 3556     RPC/s Max: 3556     RPC/s
>  [W] Avg: 4742     RPC/s Min: 4742     RPC/s Max: 4742     RPC/s
>  [LNet Bandwidth of servers]
>  [R] Avg: 1185.72  MB/s  Min: 1185.72  MB/s  Max: 1185.72  MB/s
>  [W] Avg: 1185.72  MB/s  Min: 1185.72  MB/s  Max: 1185.72  MB/s
>
>
>
>
> Any Ideas why a single thread over network obtain 88MB/sec while the same
> test conducted local obtained 966MB/sec??
>
> What else should I test/read/try ??
>
> 10x
>
> Below are the actual numbers:
>
> ===== obdfilter_survey case = disk ======
> Wed Jul  6 13:24:57 IDT 2011 Obdfilter-survey for case=disk from oss1
> ost  1 sz 16777216K rsz 1024K obj    1 thr    1 write  966.90
> [ 644.40,1030.02] rewrite 1286.23 [1300.78,1315.77] read
> 8474.33             SHORT
> ost  1 sz 16777216K rsz 1024K obj    1 thr    2 write 1577.95
> [1533.57,1681.43] rewrite 1548.29 [1244.83,1718.42] read
> 11003.26             SHORT
> ost  1 sz 16777216K rsz 1024K obj    1 thr    4 write 1465.68
> [1354.73,1600.50] rewrite 1484.98 [1271.54,1584.52] read
> 16464.13             SHORT
> ost  1 sz 16777216K rsz 1024K obj    1 thr    8 write 1267.39
> [ 797.25,1476.48] rewrite 1350.28 [1283.80,1387.70] read
> 15353.69             SHORT
> ost  1 sz 16777216K rsz 1024K obj    1 thr   16 write 1295.35
> [1266.82,1408.70] rewrite 1332.59 [1315.61,1429.66] read
> 15001.67             SHORT
> ost  1 sz 16777216K rsz 1024K obj    2 thr    2 write 1467.80
> [1472.62,1691.42] rewrite 1218.88 [ 821.23,1338.74] read
> 13538.41             SHORT
> ost  1 sz 16777216K rsz 1024K obj    2 thr    4 write 1561.09
> [1521.57,1682.75] rewrite 1183.31 [ 959.10,1372.52] read
> 15955.31             SHORT
> ost  1 sz 16777216K rsz 1024K obj    2 thr    8 write 1498.74
> [1543.58,1704.41] rewrite 1116.19 [1001.06,1163.91] read
> 15523.22             SHORT
> ost  1 sz 16777216K rsz 1024K obj    2 thr   16 write 1462.54
> [ 985.08,1615.48] rewrite 1244.29 [1100.97,1444.80] read
> 15174.56             SHORT
> ost  1 sz 16777216K rsz 1024K obj    4 thr    4 write 1483.42
> [1497.88,1648.45] rewrite 1042.92 [ 801.25,1192.69] read
> 15997.30             SHORT
> ost  1 sz 16777216K rsz 1024K obj    4 thr    8 write 1494.63
> [1458.85,1624.13] rewrite 1041.81 [ 806.25,1183.89] read
> 15450.18             SHORT
> ost  1 sz 16777216K rsz 1024K obj    4 thr   16 write 1469.96
> [1450.65,1647.45] rewrite 1027.06 [ 645.50,1215.86] read
> 15543.46             SHORT
> ost  1 sz 16777216K rsz 1024K obj    8 thr    8 write 1417.93
> [1250.85,1520.58] rewrite 1007.45 [ 905.15,1130.82] read
> 15789.66             SHORT
> ost  1 sz 16777216K rsz 1024K obj    8 thr   16 write 1324.28
> [ 951.87,1518.26] rewrite  986.48 [ 855.21,1079.99] read
> 15510.70             SHORT
> ost  1 sz 16777216K rsz 1024K obj   16 thr   16 write 1237.22
> [ 989.07,1345.17] rewrite  915.56 [ 749.08,1033.03] read
> 15415.75             SHORT
>
> ==============================
>
> ====== obdfilter_survey case = network ========================
> Wed Jul  6 16:29:38 IDT 2011 Obdfilter-survey for case=network from
> oss6
> ost  1 sz 16777216K rsz 1024K obj    1 thr    1 write   87.99
> [  86.92,  88.92] rewrite   87.98 [  86.83,  88.92] read   88.09
> [  86.92,  88.92]
> ost  1 sz 16777216K rsz 1024K obj    1 thr    2 write  175.76
> [ 173.84, 176.83] rewrite  175.75 [ 174.84, 176.83] read  172.76
> [ 171.67, 174.84]
> ost  1 sz 16777216K rsz 1024K obj    1 thr    4 write  343.13
> [ 327.69, 347.67] rewrite  344.64 [ 342.34, 347.67] read  331.20
> [ 327.69, 337.77]
> ost  1 sz 16777216K rsz 1024K obj    1 thr    8 write  638.44
> [ 638.10, 653.39] rewrite  639.07 [ 627.75, 654.74] read  605.36
> [ 598.84, 626.71]
> ost  1 sz 16777216K rsz 1024K obj    1 thr   16 write 1257.67
> [1216.88,1424.42] rewrite 1231.61 [1200.67,1316.77] read 1122.70
> [1095.04,1187.64]
> ost  1 sz 16777216K rsz 1024K obj    2 thr    2 write  175.69
> [ 174.49, 176.83] rewrite  175.82 [ 174.79, 176.83] read  172.06
> [ 169.67, 173.84]
> ost  1 sz 16777216K rsz 1024K obj    2 thr    4 write  345.38
> [ 343.68, 348.67] rewrite  344.40 [ 342.66, 348.32] read  331.19
> [ 328.62, 337.68]
> ost  1 sz 16777216K rsz 1024K obj    2 thr    8 write  638.29
> [ 625.16, 676.37] rewrite  632.57 [ 619.43, 672.38] read  604.72
> [ 601.69, 625.41]
> ost  1 sz 16777216K rsz 1024K obj    2 thr   16 write 1247.19
> [1212.38,1377.73] rewrite 1265.31 [1220.56,1396.71] read 1127.87
> [1099.97,1187.67]
> ost  1 sz 16777216K rsz 1024K obj    4 thr    4 write  343.96
> [ 341.68, 347.67] rewrite  337.98 [ 324.70, 348.67] read  332.27
> [ 327.69, 337.68]
> ost  1 sz 16777216K rsz 1024K obj    4 thr    8 write  637.15
> [ 626.89, 673.38] rewrite  636.47 [ 624.42, 675.37] read  605.98
> [ 604.43, 620.64]
> ost  1 sz 16777216K rsz 1024K obj    4 thr   16 write 1260.31
> [1198.30,1419.70] rewrite 1289.95 [1235.05,1486.35] read 1119.08
> [1081.16,1159.77]
> ost  1 sz 16777216K rsz 1024K obj    8 thr    8 write  636.82
> [ 628.41, 678.37] rewrite  634.36 [ 622.41, 671.38] read  607.59
> [ 601.23, 627.79]
> ost  1 sz 16777216K rsz 1024K obj    8 thr   16 write 1257.81
> [1207.65,1405.00] rewrite 1267.45 [1233.43,1372.72] read 1125.58
> [1114.65,1163.67]
> ost  1 sz 16777216K rsz 1024K obj   16 thr   16 write 1247.34
> [1215.70,1418.69] rewrite 1249.45 [1194.92,1372.73] read 1118.77
> [1082.07,1171.94]
>
> ============================
>
> ======= OST Survey ==========
> ost-survey -s 10000
>
>
>
> Worst  Read OST indx: 0 speed: 156.223264
> Best   Read OST indx: 4 speed: 172.706590
> Read Average: 163.681117 +/- 5.299526 MB/s
> Worst  Write OST indx: 4 speed: 307.893338
> Best   Write OST indx: 2 speed: 370.923486
> Write Average: 346.664793 +/- 20.849197 MB/s
> Ost#  Read(MB/s)  Write(MB/s)  Read-time  Write-time
> ----------------------------------------------------
> 0     156.223       354.215        64.011      28.231
> 1     164.394       349.652        60.830      28.600
> 2     162.195       370.923        61.654      26.960
> 3     162.887       350.640        61.392      28.519
> 4     172.707       307.893        57.902      32.479
>
>
>
> 10x
>
> --lior
> --
> ----------------------oo--o(:-:)o--oo----------------
> Lior Amar, Ph.D.
> Cluster Logic Ltd --> The Art of HPC
> www.clusterlogic.net
> ----------------------------------------------------------
>
>
>
>
> --
> ----------------------oo--o(:-:)o--oo----------------
> Lior Amar, Ph.D.
> Cluster Logic Ltd --> The Art of HPC
> www.clusterlogic.net
> ----------------------------------------------------------
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>


-- 
cliffw
Support Guy
WhamCloud, Inc.
www.whamcloud.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110706/fcdfea54/attachment.htm>


More information about the lustre-discuss mailing list