<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
I’ve been able to determine what is causing the dump_cqe failures, but not why it is happening now (all of a sudden).
<div class=""><br class="">
</div>
<div class="">In Lustre, we pass an IOV of fragments to be RDMA’ed over IB.  The fragments need to be page aligned except that the first fragment does not have to start on a page boundary and the last fragment does not have to end on a page boundary.</div>
<div class=""><br class="">
</div>
<div class="">When we set up the DMA addresses for remote RDMA, we mask off the fragments so the addresses are all on a page boundary.  I guess the original authors believed that all DMA addresses needed to be page aligned for IB hardware.  The mlx5 code (MOFED
 4 specific?) does not like that we are not using the actual start address and is rejecting it in the form of a dump_cqe error.</div>
<div class=""><br class="">
</div>
<div class="">This code does not seem to be a problem with MOFED 3.x so has something changed?  Has a page alignment restriction been removed?  I really cannot just turn off this alignment operation as I have no idea what will break elsewhere in the world of
 OFED/MOFED.</div>
<div class=""><br class="">
</div>
<div class="">Could use some insight from people who understand IB hardware/firmware.</div>
<div class=""><br class="">
</div>
<div class="">Doug</div>
<div class=""><br class="">
<div>
<blockquote type="cite" class="">
<div class="">On May 11, 2017, at 11:26 AM, Indivar Nair <<a href="mailto:indivar.nair@techterra.in" class="">indivar.nair@techterra.in</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div dir="ltr" class="">
<div class="">Thanks for the advice.<br class="">
I had a hunch that the development will take time.<br class="">
<br class="">
</div>
<div class="">Regards,<br class="">
<br class="">
<br class="">
</div>
<div class="">Indivar Nair<br class="">
</div>
</div>
<div class="gmail_extra"><br class="">
<div class="gmail_quote">On Thu, May 11, 2017 at 11:28 PM, Oucharek, Doug S <span dir="ltr" class="">
<<a href="mailto:doug.s.oucharek@intel.com" target="_blank" class="">doug.s.oucharek@intel.com</a>></span> wrote:<br class="">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word" class="">As I write this, I am banging my head against this wall trying to figure it out.  It is related to the new memory region registration process used by mlx5 cards.  I could really use the help of any Mellanox/RDMA experts
 out there.  The API has virtually no documentation and without the source code for MOFED 4, I am really in unable to do much more than guess at what is going on.
<div class=""><br class="">
</div>
<div class="">So, expect this to take a long time to resolve and stick with MOFED 3.x.</div>
<span class="HOEnZb"><font color="#888888" class="">
<div class=""><br class="">
</div>
<div class="">Doug</div>
</font></span>
<div class="">
<div class="h5">
<div class=""><br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On May 11, 2017, at 10:29 AM, Indivar Nair <<a href="mailto:indivar.nair@techterra.in" target="_blank" class="">indivar.nair@techterra.in</a>> wrote:</div>
<br class="m_5513567619487031098Apple-interchange-newline">
<div class="">
<div dir="ltr" class="">
<div class="">
<div class="">
<div class="">Thanks a lot, Michael, Andreas, Simon, Doug,<br class="">
</div>
<div class="">I have already installed MLNX OFED 4:-(<br class="">
I will now have to undo it and install the earlier version.<br class="">
</div>
<div class=""><br class="">
</div>
Roughly, by when would the support for MLNX OFED 4 be available?<br class="">
<br class="">
</div>
Regards,<br class="">
<br class="">
<br class="">
</div>
Indivar Nair<br class="">
<div class="">
<div class="">
<div class="gmail_extra"><br class="">
<div class="gmail_quote">On Thu, May 11, 2017 at 9:35 PM, Oucharek, Doug S <span dir="ltr" class="">
<<a href="mailto:doug.s.oucharek@intel.com" target="_blank" class="">doug.s.oucharek@intel.com</a>></span> wrote:<br class="">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div class="">The note regarding MOFED 4 not supported by Lustre: I’m working on it. MOFED 4 did not drop support of Lustre, but did make API/behaviour changes which Lustre has not fully adapted to yet.  The ball is in the Lustre community’s court on this one
 now. <span class="m_5513567619487031098gmail-HOEnZb"><font color="#888888" class="">
<div class=""><br class="">
</div>
<div class="">Doug</div>
</font></span>
<div class="">
<div class="m_5513567619487031098gmail-h5">
<div class=""><br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On May 11, 2017, at 8:47 AM, Simon Guilbault <<a href="mailto:simon.guilbault@calculquebec.ca" target="_blank" class="">simon.guilbault@calculquebec.<wbr class="">ca</a>> wrote:</div>
<br class="m_5513567619487031098gmail-m_-5778260983002475721Apple-interchange-newline">
<div class="">
<div dir="ltr" style="font-family:helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class="">
<div class="">
<div class="">Hi, your lnet.conf look fine, I tested lnet with RoCE V2 a while back with a pair of server using Connectx4 with a single 25Gb interface and RDMA was working with Centos 7.3, stock RHEL OFED and Lustre 2.9. The only settings that I had to use
 in lustre's config was this one:</div>
<div class=""><br class="">
</div>
<div class="">options lnet networks=o2ib(ens2)</div>
</div>
<div class=""><br class="">
</div>
<div class="">The performance was about the same (1.9GB/s) without any tuning with the lnet self-test but the CPU utilisation was a lot lower with RDMA than TCP (3% vs 65% of a core). </div>
<div class=""><br class="">
</div>
<div class="">From my notes I took back then Lustre needed to be recompiled with MLNX OFED 3.4 and MLNX OFED 4 dropped support of Lustre accordings to their release notes.</div>
<div class=""><br class="">
</div>
<div class="">Ref 965588</div>
<div class=""><a href="https://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_Release_Notes_4_0-2_0_0_1.pdf" target="_blank" class="">https://www.mellanox.com/relat<wbr class="">ed-docs/prod_software/Mellanox<wbr class="">_OFED_Linux_Release_Notes_4_0-<wbr class="">2_0_0_1.pdf</a><br class="">
</div>
<div class=""><a href="https://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_Release_Notes_4_0-2_0_2_0.pdf" target="_blank" class="">https://www.mellanox.com/relat<wbr class="">ed-docs/prod_software/Mellanox<wbr class="">_OFED_Linux_Release_Notes_4_0-<wbr class="">2_0_2_0.pdf</a><br class="">
</div>
<div class=""><br class="">
</div>
</div>
<div class="gmail_extra" style="font-family:helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<br class="">
<div class="gmail_quote">On Thu, May 11, 2017 at 11:34 AM, Indivar Nair<span class="m_5513567619487031098gmail-m_-5778260983002475721Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:indivar.nair@techterra.in" target="_blank" class="">indivar.nair@techterra.i<wbr class="">n</a>></span><span class="m_5513567619487031098gmail-m_-5778260983002475721Apple-converted-space"> </span>wrote:<br class="">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr" class="">
<div class="">
<div class="">
<div class="">
<div class="">So I should add something like this in lnet.conf -<br class="">
<br class="">
</div>
options lnet networks=o2ib0(p4p1)<br class="">
<br class="">
</div>
Thats it, right?<br class="">
<br class="">
</div>
Regards,<br class="">
<br class="">
<br class="">
</div>
Indivar Nair<br class="">
</div>
<div class="m_5513567619487031098gmail-m_-5778260983002475721HOEnZb">
<div class="m_5513567619487031098gmail-m_-5778260983002475721h5">
<div class="gmail_extra"><br class="">
<div class="gmail_quote">On Thu, May 11, 2017 at 8:39 PM, Dilger, Andreas<span class="m_5513567619487031098gmail-m_-5778260983002475721Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:andreas.dilger@intel.com" target="_blank" class="">andreas.dilger@intel.<wbr class="">com</a>></span><span class="m_5513567619487031098gmail-m_-5778260983002475721Apple-converted-space"> </span>wrote:<br class="">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
If you have RoCE cards and configure them with OFED, and configure Lustre to use o2iblnd then it should use RDMA for those interfaces. The fact that they are RoCE cards is hidden below OFED.<br class="">
<br class="">
Cheers, Andreas<br class="">
<div class="">
<div class="m_5513567619487031098gmail-m_-5778260983002475721m_-2019842202792000363h5">
<br class="">
> On May 11, 2017, at 08:36, Indivar Nair <<a href="mailto:indivar.nair@techterra.in" target="_blank" class="">indivar.nair@techterra.in</a>> wrote:<br class="">
><br class="">
> Hi ...,<br class="">
><br class="">
> I have read in different forums and blogs that Lustre supports RoCE.<br class="">
> But I cant find any documentation on it.<br class="">
><br class="">
> I have a Lustre setup with 6 OSS and 2 SMB/NFS Gateways.<br class="">
> They are all interconnected using Mellanox SN2700 100G Switch and Mellanox Connect-X4 100G NICs.<br class="">
> I have installed the Mellanox OFED Drivers, but I cant find a way to tell Lustre / LNET to use RoCE.<br class="">
><br class="">
> How do I go about?<br class="">
><br class="">
> Regards,<br class="">
><br class="">
><br class="">
> Indivar Nair<br class="">
><br class="">
><br class="">
</div>
</div>
> ______________________________<wbr class="">_________________<br class="">
> lustre-discuss mailing list<br class="">
><span class="m_5513567619487031098gmail-m_-5778260983002475721Apple-converted-space"> </span><a href="mailto:lustre-discuss@lists.lustre.org" target="_blank" class="">lustre-discuss@lists.lustre.<wbr class="">org</a><br class="">
><span class="m_5513567619487031098gmail-m_-5778260983002475721Apple-converted-space"> </span><a href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org" rel="noreferrer" target="_blank" class="">http://lists.lustre.org/list<wbr class="">info.cgi/lustre-discuss-lustre<wbr class="">.org</a><br class="">
</blockquote>
</div>
<br class="">
</div>
</div>
</div>
<br class="">
______________________________<wbr class="">_________________<br class="">
lustre-discuss mailing list<br class="">
<a href="mailto:lustre-discuss@lists.lustre.org" target="_blank" class="">lustre-discuss@lists.lustre.or<wbr class="">g</a><br class="">
<a href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org" rel="noreferrer" target="_blank" class="">http://lists.lustre.org/listin<wbr class="">fo.cgi/lustre-discuss-lustre.o<wbr class="">rg</a><br class="">
<br class="">
</blockquote>
</div>
<br class="">
</div>
<span style="font-family:helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline" class="">______________________________<wbr class="">_________________</span><br style="font-family:helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class="">
<span style="font-family:helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline" class="">lustre-discuss
 mailing list</span><br style="font-family:helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class="">
<a href="mailto:lustre-discuss@lists.lustre.org" style="font-family:helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" target="_blank" class="">lustre-discuss@lists.lustre.or<wbr class="">g</a><br style="font-family:helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class="">
<a href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org" style="font-family:helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" target="_blank" class="">http://lists.lustre.org/listin<wbr class="">fo.cgi/lustre-discuss-lustre.<wbr class="">org</a></div>
</blockquote>
</div>
<br class="">
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</body>
</html>