<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">
<META NAME="GENERATOR" CONTENT="GtkHTML/3.32.2">
</HEAD>
<BODY>
Hi Oleg,<BR>
<BR>
Thank your for your suggestions. I've upgraded from GigE to IB and moved from Lustre 1.8.6 to 2.1 on my test cluster. This includes the LU-144 fix for xattrs now but doesn't seem to help. I've eliminated Windows for now and I'm using a RHEL 5.7 machine as a CIFS client since Windows 7 network tuning has its own challenges.<BR>
<BR>
Typical write speed is 44 MB/sec but from time to time it doubles to the low 90 MB/sec then slows back down (see collectl logs from Lustre client/Samba export server below). This cycle can happen twice during a 10 GB write test for no reason I can currently see. I've also noticed that when I break out of a write using dd via ctrl-c the performance jumps to 96 MB/sec until a cache somewhere runs out. This also happens right before the end of a write - the last few hundred MB write at full speed (seen at the end of the collectl log below). <BR>
<BR>
I've tried tuning vfs_cache_pressure, max_rpcs_in_flight and max_dirty_mb on the Lustre client exporting Samba. Tweaking cache pressure seems to make no difference but setting max_rpcs and max_dirty to 256 seems optimal. FWIW - NFS performance is rock solid at 100 MB/sec, what else might I try tuning?<BR>
<BR>
<BR>
Thank you,<BR>
<BR>
Dan<BR>
<BR>
[<A HREF="mailto:root@stuff">root@test2</A> ~]# obdfilter-survey <BR>
Wed Oct 5 14:06:50 PDT 2011 Obdfilter-survey for case=disk from test2<BR>
ost 1 sz 16777216K rsz 1024K obj 1 thr 1 write 216.00 [ 101.89, 301.66] rewrite 285.53 [ 23.97, 400.56] read 194.76 [ 94.82, 249.72] <BR>
ost 1 sz 16777216K rsz 1024K obj 1 thr 2 write 457.09 [ 234.73, 658.27] rewrite 601.92 [ 214.76, 814.08] read 1436.60 [ 295.60,3624.32] <BR>
ost 1 sz 16777216K rsz 1024K obj 1 thr 4 write 598.33 [ 390.56, 835.21] rewrite 616.84 [ 360.60, 886.00] read 2361.10 [ 639.82,5046.86] <BR>
ost 1 sz 16777216K rsz 1024K obj 1 thr 8 write 638.57 [ 192.59, 985.76] rewrite 667.73 [ 114.87, 970.75] read 2026.85 [ 451.50,2823.11] <BR>
ost 1 sz 16777216K rsz 1024K obj 1 thr 16 write 646.61 [ 485.50, 862.04] rewrite 641.01 [ 0.00,1040.83] read 1872.22 [ 716.21,4353.40] <BR>
ost 1 sz 16777216K rsz 1024K obj 2 thr 2 write 529.88 [ 288.68, 774.13] rewrite 505.26 [ 230.75, 824.10] read 1072.79 [ 8.99,4203.22] <BR>
ost 1 sz 16777216K rsz 1024K obj 2 thr 4 write 644.92 [ 277.41, 782.12] rewrite 604.37 [ 367.59, 907.99] read 1508.34 [ 12.99,5049.31] <BR>
ost 1 sz 16777216K rsz 1024K obj 2 thr 8 write 688.28 [ 63.93,1012.86] rewrite 675.73 [ 159.82, 996.89] read 1678.51 [ 19.00,3427.55] <BR>
ost 1 sz 16777216K rsz 1024K obj 2 thr 16 write 677.00 [ 48.95, 968.92] rewrite 630.08 [ 324.63,1030.83] read 1595.35 [ 42.95,2839.49] <BR>
ost 1 sz 16777216K rsz 1024K obj 4 thr 4 write 652.07 [ 92.90, 936.98] rewrite 580.03 [ 337.61, 938.83] read 1363.86 [ 7.99,4020.78] <BR>
ost 1 sz 16777216K rsz 1024K obj 4 thr 8 write 684.45 [ 220.75, 997.88] rewrite 577.64 [ 206.83,1009.07] read 1357.85 [ 18.98,3411.98] <BR>
ost 1 sz 16777216K rsz 1024K obj 4 thr 16 write 711.37 [ 138.86,1109.76] rewrite 579.74 [ 257.71, 904.99] read 1528.18 [ 33.96,4054.37] <BR>
ost 1 sz 16777216K rsz 1024K obj 8 thr 8 write 617.76 [ 0.00,1075.80] rewrite 515.14 [ 315.65, 815.08] read 1383.29 [ 19.98,4909.61] <BR>
ost 1 sz 16777216K rsz 1024K obj 8 thr 16 write 531.77 [ 0.00,1173.02] rewrite 605.71 [ 170.81,1195.53] read 1602.59 [ 32.96,3676.41] <BR>
ost 1 sz 16777216K rsz 1024K obj 16 thr 16 write 579.66 [ 210.54,1203.26] rewrite 523.99 [ 95.89, 962.91] read 1484.40 [ 31.96,2635.00] <BR>
done!<BR>
<BR>
#<----------Disks-----------><----------Network----------><-----------InfiniBand-----------><--------Lustre Client--------><BR>
#KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut KBIn PktIn KBOut PktOut Errs KBRead Reads KBWrite Writes<BR>
56 10 0 0 42151 28993 669 12001 431 10439 41629 20639 1 0 0 40544 724<BR>
60 13 0 0 42209 29045 684 12099 421 10180 40588 20125 1 0 0 40656 726<BR>
48 11 0 0 42212 29030 664 11907 431 10440 41629 20640 1 0 0 40600 725<BR>
44 11 0 0 42247 29063 684 12109 432 10442 41629 20642 1 0 0 40712 727<BR>
56 11 436 3 41238 28390 671 12057 420 10188 40629 20143 1 0 0 39632 708<BR>
48 12 1340 46 42529 29260 710 12593 431 10440 41629 20640 1 0 0 40992 732<BR>
40 9 0 0 42353 29134 684 12279 421 10180 40588 20125 1 0 0 40712 727<BR>
56 14 0 0 36304 24977 601 10640 377 9134 36425 18059 1 0 0 35000 625<BR>
60 12 0 0 41730 28701 671 12048 420 10169 40548 20104 1 0 0 40168 717<BR>
36 7 444 5 42703 29378 709 12587 432 10441 41629 20641 1 0 0 41104 734<BR>
48 11 796 123 41875 28797 665 11921 420 10179 40588 20124 1 0 0 40264 719<BR>
32 8 0 0 41784 28746 670 11861 420 10179 40588 20124 1 0 0 40264 719<BR>
64 9 0 0 41670 28662 661 11861 431 10439 41629 20639 1 0 0 40096 716<BR>
88 11 0 0 41948 28861 677 11975 421 10180 40588 20125 1 0 0 40376 721<BR>
36 9 432 5 41914 28833 668 11991 421 10189 40629 20144 1 0 0 40360 721<BR>
40 7 0 0 48113 33099 865 15455 525 12228 48303 24070 1 0 0 46312 827<BR>
56 10 0 0 47996 33011 771 13828 577 12177 46872 23685 1 0 0 46200 825<BR>
76 14 0 0 48159 33128 779 13824 575 12263 47323 23883 1 0 0 46312 827<BR>
64 14 0 0 48708 33522 810 14567 575 12338 47716 24056 1 0 0 46872 837<BR>
36 6 440 3 47734 32838 774 13723 560 12085 46811 23580 1 0 0 45986 821<BR>
52 11 0 0 47396 32595 761 13656 573 12074 46442 23476 1 0 0 45640 815<BR>
52 12 20 3 47473 32657 770 13651 578 11990 45891 23253 1 0 0 45696 816<BR>
28 6 4 1 63469 43650 840 14886 657 15921 63484 31475 1 0 0 60984 1089<BR>
28 7 0 0 88140 60619 928 15999 889 21463 85610 42448 1 0 0 84896 1516<BR>
#<----------Disks-----------><----------Network----------><-----------InfiniBand-----------><--------Lustre Client--------><BR>
#KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut KBIn PktIn KBOut PktOut Errs KBRead Reads KBWrite Writes<BR>
76 10 372 4 93116 64047 1184 20928 942 22855 91104 45169 1 0 0 89600 1600<BR>
36 9 0 0 92165 63392 1162 20365 939 22730 90633 44937 1 0 0 88737 1585<BR>
72 15 0 0 90408 62177 1027 18007 906 21925 87421 43345 1 0 0 87024 1554<BR>
60 12 0 0 90731 62406 1101 19224 928 22451 89504 44381 1 0 0 87024 1554<BR>
20 3 0 0 86886 59746 976 17095 874 21142 84298 41797 1 0 0 83944 1499<BR>
0 0 404 3 88630 60957 1035 18011 906 21924 87421 43344 1 0 0 85288 1523<BR>
0 0 0 0 92254 63439 1127 19864 936 22661 90362 44801 1 0 0 88751 1585<BR>
0 0 0 0 90122 61993 1037 18027 917 22207 88550 43904 1 0 0 86775 1550<BR>
0 0 0 0 90662 62365 1084 19077 916 22185 88461 43860 1 0 0 87304 1559<BR>
4 1 0 0 87496 60176 930 16037 957 21686 85416 42596 1 0 0 84168 1503<BR>
120 26 184 3 89070 61251 1024 17969 905 21923 87421 43343 1 0 0 85736 1531<BR>
136 28 0 0 93832 64533 1254 22082 949 22969 91584 45409 1 0 0 90328 1613<BR>
108 25 0 0 88874 61114 983 17193 896 21685 86466 42871 1 0 0 85542 1528<BR>
120 26 12 2 85982 59130 904 15576 876 21144 84299 41798 1 0 0 82768 1478<BR>
112 24 0 0 95007 65339 1193 21067 965 23444 93492 46352 1 0 0 91448 1633<BR>
108 24 392 5 98034 67418 1286 22632 982 23753 94706 46957 1 0 0 94360 1685<BR>
112 27 0 0 92812 63822 1105 19430 948 22966 91583 45406 1 0 0 89376 1596<BR>
108 27 0 0 95831 65913 1195 21106 970 23468 93572 46395 1 0 0 92196 1646<BR>
144 27 0 0 94472 64971 1230 21623 959 23227 92624 45922 1 0 0 90944 1624<BR>
116 26 0 0 91589 62983 1094 19259 927 22446 89502 44376 1 0 0 88200 1575<BR>
116 25 380 4 87707 60325 1008 17527 884 21402 85339 42312 1 0 0 84392 1507<BR>
156 26 0 0 89071 61254 1009 17684 906 21925 87421 43344 1 0 0 85736 1531<BR>
104 24 0 0 88192 60656 968 16752 895 21663 86380 42828 1 0 0 84952 1517<BR>
112 26 0 0 95460 65643 1183 20874 974 23541 93822 46529 1 0 0 91820 1640<BR>
#<----------Disks-----------><----------Network----------><-----------InfiniBand-----------><--------Lustre Client--------><BR>
#KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut KBIn PktIn KBOut PktOut Errs KBRead Reads KBWrite Writes<BR>
112 26 0 0 92606 63693 914 15654 999 22998 90541 45197 1 0 0 89096 1591<BR>
64 13 360 3 98450 67697 1170 20575 1065 24594 96847 48327 1 0 0 94864 1694<BR>
68 9 0 0 101979 70135 1255 21960 1091 25245 99496 49629 1 0 0 98112 1752<BR>
64 10 0 0 95772 65854 1048 18314 1029 23780 93685 46742 1 0 0 92196 1646<BR>
52 10 0 0 100530 69147 1333 23471 1059 24827 98158 48882 1 0 0 96768 1728<BR>
48 10 0 0 99672 68550 1147 20132 1087 24841 97625 48778 1 0 0 95928 1713<BR>
52 12 420 3 96497 66371 1129 19678 1051 23968 94140 47049 1 0 0 92904 1659<BR>
72 13 0 0 93024 63972 883 15220 1017 23275 91480 45700 1 0 0 89600 1600<BR>
88 12 844 106 93913 64589 1112 19382 969 23100 91772 45600 1 0 0 90328 1613<BR>
48 8 12 2 92992 63951 1114 19606 948 22967 91584 45417 1 0 0 89544 1599<BR>
52 11 0 0 91506 62935 1086 18929 918 22208 88550 43914 1 0 0 88064 1573<BR>
32 7 384 4 92093 63329 1173 20741 937 22706 90543 44901 1 0 0 88704 1584<BR>
60 15 0 0 88196 60659 909 15625 891 21521 85844 42572 1 0 0 84896 1516<BR>
136 30 0 0 89004 61201 946 16490 898 21751 86691 42992 1 0 0 85624 1529<BR>
128 30 0 0 88558 60917 928 15990 905 21923 87421 43353 1 0 0 85232 1522<BR>
132 30 0 0 91930 63222 1019 17829 927 22426 89414 44343 1 0 0 88504 1580<BR>
124 31 264 3 96038 66051 1289 22707 981 23749 94705 46964 1 0 0 92456 1651<BR>
124 31 0 0 94318 64857 1129 19878 949 22969 91584 45419 1 0 0 90776 1621<BR>
148 30 0 0 90540 62266 1018 17669 916 22184 88462 43868 1 0 0 87136 1556<BR>
132 30 0 0 90719 62380 1069 18791 927 22446 89502 44386 1 0 0 87360 1560<BR>
136 31 0 0 94233 64813 1289 22738 949 22969 91584 45419 1 0 0 90720 1620<BR>
124 31 160 3 95711 65814 1265 22435 971 23514 93759 46496 1 0 0 92100 1645<BR>
196 34 0 0 92808 63832 1111 19377 938 22707 90543 44901 1 0 0 89320 1595<BR>
128 31 0 0 94191 64768 1153 20335 959 23228 92625 45933 1 0 0 90664 1619<BR>
#<----------Disks-----------><----------Network----------><-----------InfiniBand-----------><--------Lustre Client--------><BR>
#KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut KBIn PktIn KBOut PktOut Errs KBRead Reads KBWrite Writes<BR>
104 26 12 2 93648 64405 1196 20987 939 22708 90543 44903 1 0 0 90216 1611<BR>
124 24 0 0 92767 63796 1115 19624 947 22943 91492 45371 1 0 0 89231 1593<BR>
112 22 332 5 91806 63141 1117 19516 928 22447 89502 44386 1 0 0 88368 1578<BR>
112 22 0 0 90513 62240 1097 19321 916 22185 88462 43870 1 0 0 87136 1556<BR>
104 26 0 0 92454 63586 1212 21303 938 22707 90543 44901 1 0 0 88984 1589<BR>
108 23 0 0 96394 66287 1236 21875 981 23750 94705 46966 1 0 0 92792 1657<BR>
88 21 0 0 59498 40932 915 16210 593 14358 57241 28389 1 0 0 57344 1024<BR>
116 21 452 3 41957 28859 671 12040 421 10190 40629 20149 1 0 0 40416 722<BR>
124 24 0 0 42094 28958 681 12065 431 10439 41629 20643 1 0 0 40488 723<BR>
96 21 0 0 42365 29138 684 12275 421 10181 40589 20131 1 0 0 40824 729<BR>
96 21 0 0 46792 32194 759 13456 545 12034 46890 23543 1 0 0 45024 804<BR>
96 22 0 0 48329 33239 775 13903 569 12234 47340 23860 1 0 0 46536 831<BR>
96 22 416 3 48720 33517 793 14072 569 12350 47946 24131 1 0 0 46881 837<BR>
108 26 0 0 47621 32761 765 13721 579 12107 46448 23509 1 0 0 45808 818<BR>
80 20 0 0 47541 32708 775 13753 590 12242 46840 23739 1 0 0 45752 817<BR>
104 22 0 0 47716 32819 765 13719 584 12182 46710 23648 1 0 0 45976 821<BR>
100 22 0 0 44963 30936 730 12937 484 10964 43000 21511 1 0 0 43232 772<BR>
92 20 404 5 42537 29255 683 12261 431 10440 41629 20645 1 0 0 40992 732<BR>
100 24 0 0 42514 29253 689 12208 432 10450 41671 20665 1 0 0 40921 731<BR>
124 26 0 0 42689 29361 683 12259 431 10440 41629 20645 1 0 0 41048 733<BR>
84 19 0 0 42591 29307 693 12270 432 10442 41630 20646 1 0 0 40992 732<BR>
120 27 0 0 42659 29343 690 12380 431 10440 41629 20645 1 0 0 41104 734<BR>
88 22 964 95 42590 29323 699 12398 441 10700 42670 21159 1 0 0 40992 732<BR>
80 19 0 0 42485 29219 680 12200 421 10171 40548 20110 1 0 0 40895 730<BR>
#<----------Disks-----------><----------Network----------><-----------InfiniBand-----------><--------Lustre Client--------><BR>
#KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut KBIn PktIn KBOut PktOut Errs KBRead Reads KBWrite Writes<BR>
112 24 0 0 42509 29246 686 12152 428 10428 41574 20615 1 0 0 40880 730<BR>
128 25 0 0 42473 29212 678 12149 431 10440 41629 20644 1 0 0 40880 730<BR>
148 21 0 0 42568 29315 698 12364 431 10440 41629 20645 1 0 0 40992 732<BR>
116 21 404 4 42679 29361 680 12198 436 10476 41796 20728 1 0 0 41048 733<BR>
96 23 0 0 42420 29185 687 12160 426 10345 41218 20440 1 0 0 40880 730<BR>
108 23 0 0 42162 29000 674 12099 432 10450 41671 20665 1 0 0 40585 725<BR>
216 36 348 10 41016 28221 669 11840 410 9918 39548 19612 1 0 0 39480 705<BR>
104 23 0 0 42535 29287 692 12422 432 10441 41629 20646 1 0 0 40936 731<BR>
112 25 12 3 42429 29194 690 12223 431 10440 41629 20645 1 0 0 40824 729<BR>
108 25 0 0 42395 29160 681 12217 431 10441 41629 20645 1 0 0 40824 729<BR>
100 20 0 0 42660 29348 708 12564 431 10430 41587 20623 1 0 0 41063 733<BR>
100 21 444 4 45535 31315 770 13849 464 11223 44751 22193 1 0 0 43792 782<BR>
76 19 0 0 42415 29183 690 12220 431 10440 41629 20644 1 0 0 40824 729<BR>
84 20 12 2 51770 35606 742 13233 528 12789 50996 25290 1 0 0 49784 889<BR>
100 18 0 0 97679 67181 1210 21179 999 24143 96302 47757 1 0 0 94024 1679<BR>
100 24 0 0 97904 67331 1245 22013 994 24083 95995 47605 1 0 0 94192 1682<BR>
96 22 436 4 95940 65983 1272 22382 972 23515 93759 46497 1 0 0 92436 1651<BR>
104 25 0 0 27413 18867 338 5956 250 6010 23938 11877 1 0 0 26496 474<BR>
124 24 0 0 3 22 10 13 11 261 1041 516 1 0 0 0 0<BR>
84 21 0 0 1 10 0 2 0 0 0 0 1 0 0 0 0<BR>
112 24 0 0 2 16 10 14 0 0 0 0 1 0 0 0 0<BR>
<BR>
<BR>
On Mon, 2011-09-26 at 08:06 -0400, Oleg Drokin wrote:
<BLOCKQUOTE TYPE=CITE>
<PRE>
Hello!
On Sep 22, 2011, at 11:31 AM, Dan wrote:
> After researching this in the archives it seems 85% of Lustre's native performance is reasonable via NFS. Over GigE I'm seeing 114 MB/sec read/write via Lustre native.
NFS writes at 44 MB/sec via NFS (about 38%) but reads at 95 MB/sec. CIFS is lucky to see 18 MB/sec writes after tuning Samba (12 MB/sec prior). I've tried tuning the Lustre
client and NFS client and server but nothing impacts the performance except setting noatime on NFS, which gets another 1.5 MB/sec on writes. What are some of you seeing in this configuration? Suggestions?
One issue is likely frequent attempts to use xattrs (there is a patch to speed this up in recent Lustre, LU-144 I think it is) and flocks.
Also I heard to cifs only does small IO - 64kb packets over the wire (and corresponding small reads), if you have high latency link, that would
probably have pretty negative impact too.
Bye,
Oleg
--
Oleg Drokin
Senior Software Engineer
Whamcloud, Inc.
</PRE>
</BLOCKQUOTE>
<BR>
</BODY>
</HTML>