[Lustre-discuss] iozone slow read for 64k record size 2.4 vs. 1.8.9

JS Landry jean-sebastien.landry at calculquebec.ca
Tue Sep 3 12:50:12 PDT 2013


Hi, thanks for the patch, the error is still present in collectl 3.6.7.
JS

On 30/08/13 02:59 AM, Grégoire Pichon wrote:
> Hi,
>
> I found a small bug in collectl (I am using version V3.6.3-2) when it harvests read bandwidth.
> Maybe you are facing the same issue.
>
> Here is the patch
> $ diff -Nraup ~/bin/collectl/collectl.orig ~/bin/collectl/collectl
> --- /home_nfs/pichong/bin/collectl/collectl.orig	2012-10-23 17:53:22.000000000 +0200
> +++ /home_nfs/pichong/bin/collectl/collectl	2012-10-23 17:53:27.000000000 +0200
> @@ -3754,7 +3754,7 @@ sub getProc
>       elsif ($type==11)
>       {
>         if ($line=~/^dirty/)      { record(2, "$tag $line"); next; }
> -      if ($line=~/^read/)       { record(2, "$tag $line"); next; }
> +      if ($line=~/^read_/)       { record(2, "$tag $line"); next; }
>         if ($line=~/^write_/)     { record(2, "$tag $line"); next; }
>         if ($line=~/^open/)       { record(2, "$tag $line"); next; }
>         if ($line=~/^close/)      { record(2, "$tag $line"); next; }
>
> The stats file (/proc/fs/lustre/llite/fs-*/stats) contains 2 lines starting with "read" string.
>      # grep "^read" /proc/fs/lustre/llite/fs1-ffff880472973400/stats
>      read_bytes                3402752 samples [bytes] 1048576 16777216 4037269258240
>      readdir                   240 samples [regs]
>
>
> Regards,
> Grégoire.
> --
> Grégoire PICHON
> Parallel File Systems Engineer
> Bull Extreme Computing R&D
>
>
> -----Message d'origine-----
> De : lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] De la part de JS Landry
> Envoyé : vendredi 30 août 2013 02:19
> À : lustre-discuss at lists.lustre.org
> Objet : Re: [Lustre-discuss] iozone slow read for 64k record size 2.4 vs. 1.8.9
>
>
> On 29/08/13 07:03 PM, JS Landry wrote:
>> Hi,I'm testing lustre 2.4 with iozone and I can't find why the read of
>> 64k record size (1GB file) is so slow compare to 1.8.9client.
>>
>> 1.8.9 client
>>                  KB  reclen   write rewrite    read    reread
>>             1048576      64  677521  794456  6130161  6204552
>>             1048576    1024  709112  862278  7165733  7152088
>>
>> 2.4.0 client
>>                  KB  reclen   write rewrite    read    reread
>>             1048576      64  682344  897808  2334044  2331080
>>             1048576    1024  868466 1217273  4599784  4610098
>>
>
> I run collectl -scml while running iozone, and I don't know what is
> going on with the "lustre KBRead/Reads"
> stuck at 4G on the 2.4.0 client.(the KBRead/Reads columns returned to 0
> when I unmount lustre.)
> collectl works ok on 1.8.9. (same os, same hardware)
>
>
> 2.6.32-358.6.2.el6.x86_64
> collectl-3.6.7-1.el6.noarch
> lustre: 2.4.0
> kernel: patchless_client
> build: 2.4.0-RC2-gd3f91c4-PRISTINE-2.6.32-358.6.2.el6.x86_64
>
>
>
> #<--------CPU--------><-----------Memory-----------><--------Lustre
> Client-------->
> #cpu sys inter  ctxsw Free Buff Cach Inac Slab  Map KBRead  Reads
> KBWrite Writes
>      6   6  2687   6132  20G    0   2G 171M 153M 322M 4096M     4G
> 175232   2738
>     21  21  5939  16537  19G    0   3G 812M 329M 322M 4096M     4G
> 656704  10261
>      6   6  2380   5751  19G    0   3G   1G 386M 322M 4096M     4G
> 216640   3385
>      3   3  1329   2773  19G    0   3G 915M 384M 322M 4096M     4G
> 111168   1737
>     24  24  6678  21347  19G    0   3G  74M 389M 322M 4096M     4G
> 861376  13459
>      2   2  1107   2082  19G    0   3G 272K 390M 322M 4096M     4G
> 76032   1188
>      0   0   500    141  19G    0   3G 272K 390M 322M 4096M     4G
> 0      0
>      6   6  1253    161  19G    0   3G 272K 390M 322M 4096M     4G
> 0      0
>      5   5  1132    139  19G    0   3G 272K 390M 322M 4096M     4G
> 0      0
>      5   5   987    360  19G    0   2G 272K 206M 322M 4096M     4G
> 0      0
>      2   2  1031    353  19G    0   2G 272K 115M 322M 4096M     4G
> 0      0
>     12  12  3490  10766  19G    0   2G 417M 220M 322M 4096M     4G
> 427008    417
>     16  16  4741  15475  19G    0   3G   1G 383M 322M 4096M     4G
> 621568    607
>      0   0   537    155  19G    0   3G   1G 384M 322M 4096M     4G
> 0      0
>     24  24  7451  24128  19G    0   3G 440K 386M 322M 4096M     4G
> 1048576   1024
>      0   0   640    176  19G    0   3G 272K 387M 322M 4096M     4G
> 0      0
>      2   2   752    133  19G    0   3G 272K 387M 322M 4096M     4G
> 0      0
>
>
> on the 1.8.9 client
>
> 2.6.32-358.6.2.el6.x86_64
> collectl-3.6.7-1.el6.noarch
> lustre: 1.8.9
> kernel: patchless_client
> build:  jenkins-wc1--PRISTINE-2.6.32-358.6.2.el6.x86_64
>
>
> #<--------CPU--------><-----------Memory-----------><--------Lustre
> Client-------->
> #cpu sys inter  ctxsw Free Buff Cach Inac Slab  Map KBRead  Reads
> KBWrite Writes
>      3   3  1585   4267  19G    0   2G 824M 318M 231M 0      0
> 205824    201
>     14  14  5653  17881  19G    0   2G   1M 323M 231M 0      0
> 842752    823
>      0   0   745    181  19G    0   2G   1M 324M 231M 0      0
> 0      0
>      2   2   645    112  19G    0   2G   1M 323M 231M 1048580
> 1025        0      0
>      2   2   724    124  19G    0   2G   1M 323M 231M 1048580
> 1025        0      0
>     12  11  2044    859  20G    0   1G   1M 223M 233M 1      3
> 0      0
>     14  14  4807  13121  20G    0   2G 644M 282M 230M 0      0   658112
> 10283
>      8   8  3245   8196  19G    0   2G   1G 320M 230M 0      0   390464
> 6101
>      0   0   570    164  19G    0   2G   1G 320M 230M 0      0
> 0      0
>     11  11  4061  12627  19G    0   2G 421M 317M 230M 0      0   618496
> 9664
>      8   8  2953   9239  19G    0   2G   1M 321M 230M 0      0   430080
> 6720
>      2   2   699    127  19G    0   2G   1M 321M 230M 1048580
> 16K        0      0
>      0   0   576    148  19G    0   2G   1M 321M 230M 0      0
> 0      0
>      3   3   783    210  19G    0   2G   1M 282M 230M 1048580
> 16K        0      0
>      2   2   725    597  20G    0   1G  21M 219M 231M 0      0
> 20480     20
>     14  14  5253  14347  20G    0   2G 714M 287M 231M 0      0
> 709632    693
>
>
> Is this a known bug?
> JS
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

-- 
Jean-Sébastien Landry
Calcul Québec, Université Laval
Jean-Sebastien.Landry at calculquebec.ca




More information about the lustre-discuss mailing list