<html><head><meta http-equiv="content-type" content="text/html; charset=UTF-8"><style>body { line-height: 1.5; }blockquote { margin-top: 0px; margin-bottom: 0px; margin-left: 0.5em; }ol, ul { margin-top: 0px; margin-bottom: 0px; list-style-position: inside; }p { margin-top: 0px; margin-bottom: 0px; }div.FoxDiv20250303114721336358 { }body { font-size: 14px; font-family: "Microsoft YaHei UI"; color: rgb(0, 0, 0); line-height: 1.5; }</style></head><body>

<div><span></span><br></div><div><span style="color: rgb(6, 6, 7); font-family: -apple-system, BlinkMacSystemFont, "Helvetica Neue", Helvetica, "Segoe UI", Arial, Roboto, "PingFang SC", MIUI, "Hiragino Sans GB", "Microsoft YaHei", sans-serif; letter-spacing: 0.25px; white-space: pre-wrap;">The root cause is that the RoCE configuration on a client is incorrect, causing the business traffic to run on priority 0 (which should theoretically be on priority 3), thereby affecting corosync.</span></div>

<div><br></div><hr style="width: 210px; height: 1px;" color="#b5c4df" size="1" align="left">

<div><span><div style="MARGIN: 10px; FONT-FAMILY: verdana; FONT-SIZE: 10pt"><div>chenzufei@gmail.com</div></div></span></div>

<blockquote style="margin-Top: 0px; margin-Bottom: 0px; margin-Left: 0.5em; margin-Right: inherit"><div> </div><div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm"><div style="PADDING-RIGHT: 8px; PADDING-LEFT: 8px; FONT-SIZE: 12px;FONT-FAMILY:tahoma;COLOR:#000000; BACKGROUND: #efefef; PADDING-BOTTOM: 8px; PADDING-TOP: 8px"><div><b>From:</b> <a href="mailto:chenzufei@gmail.com">zufei chen</a></div><div><b>Date:</b> 2024-11-25 22:42</div><div><b>To:</b> <a href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss</a></div><div><b>Subject:</b> Issue with High-Load Write Operations in Lustre Cluster</div></div></div><div><div class="FoxDiv20250303114721336358"><div dir="ltr"><p>Dear Lustre Community,</p><p>I am encountering an issue with the Lustre high-availability component, <strong>Corosync</strong>, which experiences packet loss under high load, triggering fencing and powering down Lustre nodes. I am seeking advice on how to resolve this issue. Below are the details of our environment and the problem:</p><h3>Environment:</h3><ol style="margin-top: 0px;"><li style="margin-left:15px"><strong>Lustre version:</strong> 2.15.5</li><li style="margin-left:15px"><strong>Physical machines:</strong> 11 machines, each with 128 CPU cores and 376GB of memory.</li><li style="margin-left:15px"><strong>Virtualization:</strong> Each physical machine runs a KVM virtual machine with 20 cores and 128GB of memory, using Rocky Linux 8.10.</li><li style="margin-left:15px"><strong>Lustre setup:</strong> Each VM has 2 MDTs (512GB each) and 16 OSTs (670GB each).</li><li style="margin-left:15px"><strong>Configuration</strong> (<code>/etc/modprobe.d/lustre.conf</code>):<pre style="text-wrap-mode: wrap;"><div><div dir="ltr"><code>options lnet networks="o2ib(enp0s5f0np0)"

options libcfs cpu_npartitions=2

options ost oss_num_threads=512

options mds mds_num_threads=512

options ofd adjust_blocks_percent=11

</code></div></div></pre></li><li style="margin-left:15px"><strong>Network:</strong> 100GB RDMA network.</li><li style="margin-left:15px"><strong>Clients:</strong> 11 clients using <strong>vdbench</strong> to perform large file writes (total write bandwidth approximately 50GB).</li></ol><h3>Issue:</h3><p>Under high load write operations, the <strong>Corosync</strong> component experiences packet loss. There is a probability that heartbeat loss triggers <strong>Pacemaker's fencing mechanism</strong>, which powers down the Lustre nodes.</p><h3>Analysis Conducted:</h3><ol style="margin-top: 0px;"><li style="margin-left:15px"><strong>CPU usage:</strong> The CPU utilization is not very high, but the cpu load is very high (reaching around 400).</li><li style="margin-left:15px"><strong>Packet loss:</strong> There is packet loss observed when pinging between Lustre nodes.</li><li style="margin-left:15px"><strong>Tuning <code>ost_num_threads</code> and <code>mds_num_threads</code>:</strong> Reducing these values reduced the system load and improved packet loss significantly, but it also led to a decrease in the Vdbench write bandwidth.</li><li style="margin-left:15px"><strong>Network tuning:</strong> After adjusting <code>net.ipv4.udp_mem</code> (three times larger than default), packet loss improved, but it still persists.<pre style="text-wrap-mode: wrap;"><div><div dir="ltr"><code>sysctl -w net.ipv4.udp_mem="9217055 12289407 18434106"

</code></div></div></pre></li></ol><h3>Assistance Requested:</h3><p>I would appreciate any suggestions from the community on how to resolve this issue effectively. If anyone has faced similar challenges, your insights would be especially valuable.</p><p>Thank you for your time and assistance. I look forward to your responses.</p><p>Best regards</p></div>

</div></div></blockquote>

</body></html>