<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:1714233380;
mso-list-template-ids:-2145866180;}
@list l1
{mso-list-id:1903560575;
mso-list-type:hybrid;
mso-list-template-ids:1996394660 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l1:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l1:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l1:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">The read bandwidth on my test setup is 20%-55% of the bandwidth I was able to get from a ldiskfs based Lustre 2.14 on the same VMs running CentOS 8. I’d like feedback on the observations I made during my analysis.<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><b>SETUP</b>: <o:p></o:p></p>
<p class="MsoNormal">My test setup has four VMs on a single host where:<o:p></o:p></p>
<ol style="margin-top:0in" start="1" type="1">
<li class="MsoListParagraph" style="margin-left:0in;mso-list:l1 level1 lfo3">VM1: MGS + MDS<o:p></o:p></li><li class="MsoListParagraph" style="margin-left:0in;mso-list:l1 level1 lfo3">VM2: OSS1 (w/ two 40GB OSTs)<o:p></o:p></li><li class="MsoListParagraph" style="margin-left:0in;mso-list:l1 level1 lfo3">VM3: OSS2 (w/ one 40GB OST)<o:p></o:p></li><li class="MsoListParagraph" style="margin-left:0in;mso-list:l1 level1 lfo3">VM4: POSIX client<o:p></o:p></li></ol>
<p class="MsoNormal">I’m running Lustre 2.12.6 on CentOS 7, kernel 3.10.0-1160.2.1.el7.x86_64. The OSTs use ZFS. The complete install was done using RPMs – no custom builds. <o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><b>BENCHMARK</b>:<o:p></o:p></p>
<p class="MsoNormal">While this observation holds true for several of tests I’ve performed, I will only describe one of them here.<o:p></o:p></p>
<p class="MsoNormal">I’m running fio on the client to read 200 imagenet files. 4k block size, ioengine=psync. Iodepth=1. The total size of the data set ranges from is 23.4GB. All machines have 32GB of memory.
<o:p></o:p></p>
<p class="MsoNormal">The files are cached in the OSS. And therefore, they are really being transferred from the OSS memory to the client memory and there is no disk activity to either OST.<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><b>OBSERVATIONS:</b><o:p></o:p></p>
<p class="MsoNormal">The following four graphs were plotted based on the data I collected from the four nodes using sar. After the initial blip where the client talks to the MGS, the client-OSS throughput slowly ramps down from an initial peak rate of 25 MBps.<o:p></o:p></p>
<p class="MsoNormal">The client CPU usage peaks at about 55% (not shown) and the run queue size is never over 20 (also not shown) leading me to the believe that we’re not limited by the client CPU at any point during the transfer.<o:p></o:p></p>
<p class="MsoNormal"><img width="877" height="369" style="width:9.1354in;height:3.8437in" id="Picture_x0020_1" src="cid:image001.png@01D787E1.C8057070"><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><img width="870" height="388" style="width:9.0625in;height:4.0416in" id="Picture_x0020_2" src="cid:image002.png@01D787E1.C8057070"><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><img width="874" height="470" style="width:9.1041in;height:4.8958in" id="Picture_x0020_4" src="cid:image003.png@01D787E1.C8057070"><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><img width="863" height="432" style="width:8.9895in;height:4.5in" id="Picture_x0020_5" src="cid:image004.png@01D787E1.C8057070"><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><b>QUESTION:</b><o:p></o:p></p>
<p class="MsoNormal">Why aren’t the two OSSs able to start sending data faster? What is causing the TX rate to climb gradually? Neither OSS nodes are CPU or IO limited. arcstat.py shows memory reads in line with the incoming traffic (see image below).<o:p></o:p></p>
<p class="MsoNormal">For bigger workloads, it just takes longer to hit the same peak bandwidth. I’m never able to achieve network saturation. Where is the performance bottleneck?<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><img width="825" height="359" style="width:8.5937in;height:3.7395in" id="Picture_x0020_7" src="cid:image005.png@01D787E1.C8057070"><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Thanks,<o:p></o:p></p>
<p class="MsoNormal">Vinayak<o:p></o:p></p>
</div>
</body>
</html>