[Lustre-discuss] iozone slow read for 64k record size 2.4 vs. 1.8.9
Grégoire Pichon
gregoire.pichon at bull.net
Thu Aug 29 23:59:48 PDT 2013
Hi,
I found a small bug in collectl (I am using version V3.6.3-2) when it harvests read bandwidth.
Maybe you are facing the same issue.
Here is the patch
$ diff -Nraup ~/bin/collectl/collectl.orig ~/bin/collectl/collectl
--- /home_nfs/pichong/bin/collectl/collectl.orig 2012-10-23 17:53:22.000000000 +0200
+++ /home_nfs/pichong/bin/collectl/collectl 2012-10-23 17:53:27.000000000 +0200
@@ -3754,7 +3754,7 @@ sub getProc
elsif ($type==11)
{
if ($line=~/^dirty/) { record(2, "$tag $line"); next; }
- if ($line=~/^read/) { record(2, "$tag $line"); next; }
+ if ($line=~/^read_/) { record(2, "$tag $line"); next; }
if ($line=~/^write_/) { record(2, "$tag $line"); next; }
if ($line=~/^open/) { record(2, "$tag $line"); next; }
if ($line=~/^close/) { record(2, "$tag $line"); next; }
The stats file (/proc/fs/lustre/llite/fs-*/stats) contains 2 lines starting with "read" string.
# grep "^read" /proc/fs/lustre/llite/fs1-ffff880472973400/stats
read_bytes 3402752 samples [bytes] 1048576 16777216 4037269258240
readdir 240 samples [regs]
Regards,
Grégoire.
--
Grégoire PICHON
Parallel File Systems Engineer
Bull Extreme Computing R&D
-----Message d'origine-----
De : lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] De la part de JS Landry
Envoyé : vendredi 30 août 2013 02:19
À : lustre-discuss at lists.lustre.org
Objet : Re: [Lustre-discuss] iozone slow read for 64k record size 2.4 vs. 1.8.9
On 29/08/13 07:03 PM, JS Landry wrote:
> Hi,I'm testing lustre 2.4 with iozone and I can't find why the read of
> 64k record size (1GB file) is so slow compare to 1.8.9client.
>
> 1.8.9 client
> KB reclen write rewrite read reread
> 1048576 64 677521 794456 6130161 6204552
> 1048576 1024 709112 862278 7165733 7152088
>
> 2.4.0 client
> KB reclen write rewrite read reread
> 1048576 64 682344 897808 2334044 2331080
> 1048576 1024 868466 1217273 4599784 4610098
>
I run collectl -scml while running iozone, and I don't know what is
going on with the "lustre KBRead/Reads"
stuck at 4G on the 2.4.0 client.(the KBRead/Reads columns returned to 0
when I unmount lustre.)
collectl works ok on 1.8.9. (same os, same hardware)
2.6.32-358.6.2.el6.x86_64
collectl-3.6.7-1.el6.noarch
lustre: 2.4.0
kernel: patchless_client
build: 2.4.0-RC2-gd3f91c4-PRISTINE-2.6.32-358.6.2.el6.x86_64
#<--------CPU--------><-----------Memory-----------><--------Lustre
Client-------->
#cpu sys inter ctxsw Free Buff Cach Inac Slab Map KBRead Reads
KBWrite Writes
6 6 2687 6132 20G 0 2G 171M 153M 322M 4096M 4G
175232 2738
21 21 5939 16537 19G 0 3G 812M 329M 322M 4096M 4G
656704 10261
6 6 2380 5751 19G 0 3G 1G 386M 322M 4096M 4G
216640 3385
3 3 1329 2773 19G 0 3G 915M 384M 322M 4096M 4G
111168 1737
24 24 6678 21347 19G 0 3G 74M 389M 322M 4096M 4G
861376 13459
2 2 1107 2082 19G 0 3G 272K 390M 322M 4096M 4G
76032 1188
0 0 500 141 19G 0 3G 272K 390M 322M 4096M 4G
0 0
6 6 1253 161 19G 0 3G 272K 390M 322M 4096M 4G
0 0
5 5 1132 139 19G 0 3G 272K 390M 322M 4096M 4G
0 0
5 5 987 360 19G 0 2G 272K 206M 322M 4096M 4G
0 0
2 2 1031 353 19G 0 2G 272K 115M 322M 4096M 4G
0 0
12 12 3490 10766 19G 0 2G 417M 220M 322M 4096M 4G
427008 417
16 16 4741 15475 19G 0 3G 1G 383M 322M 4096M 4G
621568 607
0 0 537 155 19G 0 3G 1G 384M 322M 4096M 4G
0 0
24 24 7451 24128 19G 0 3G 440K 386M 322M 4096M 4G
1048576 1024
0 0 640 176 19G 0 3G 272K 387M 322M 4096M 4G
0 0
2 2 752 133 19G 0 3G 272K 387M 322M 4096M 4G
0 0
on the 1.8.9 client
2.6.32-358.6.2.el6.x86_64
collectl-3.6.7-1.el6.noarch
lustre: 1.8.9
kernel: patchless_client
build: jenkins-wc1--PRISTINE-2.6.32-358.6.2.el6.x86_64
#<--------CPU--------><-----------Memory-----------><--------Lustre
Client-------->
#cpu sys inter ctxsw Free Buff Cach Inac Slab Map KBRead Reads
KBWrite Writes
3 3 1585 4267 19G 0 2G 824M 318M 231M 0 0
205824 201
14 14 5653 17881 19G 0 2G 1M 323M 231M 0 0
842752 823
0 0 745 181 19G 0 2G 1M 324M 231M 0 0
0 0
2 2 645 112 19G 0 2G 1M 323M 231M 1048580
1025 0 0
2 2 724 124 19G 0 2G 1M 323M 231M 1048580
1025 0 0
12 11 2044 859 20G 0 1G 1M 223M 233M 1 3
0 0
14 14 4807 13121 20G 0 2G 644M 282M 230M 0 0 658112
10283
8 8 3245 8196 19G 0 2G 1G 320M 230M 0 0 390464
6101
0 0 570 164 19G 0 2G 1G 320M 230M 0 0
0 0
11 11 4061 12627 19G 0 2G 421M 317M 230M 0 0 618496
9664
8 8 2953 9239 19G 0 2G 1M 321M 230M 0 0 430080
6720
2 2 699 127 19G 0 2G 1M 321M 230M 1048580
16K 0 0
0 0 576 148 19G 0 2G 1M 321M 230M 0 0
0 0
3 3 783 210 19G 0 2G 1M 282M 230M 1048580
16K 0 0
2 2 725 597 20G 0 1G 21M 219M 231M 0 0
20480 20
14 14 5253 14347 20G 0 2G 714M 287M 231M 0 0
709632 693
Is this a known bug?
JS
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
More information about the lustre-discuss
mailing list