[lustre-discuss] Issue with High-Load Write Operations in Lustre Cluster

chenzufei at gmail.com chenzufei at gmail.com
Sun Mar 2 19:50:36 PST 2025


The root cause is that the RoCE configuration on a client is incorrect, causing the business traffic to run on priority 0 (which should theoretically be on priority 3), thereby affecting corosync.



chenzufei at gmail.com
 
From: zufei chen
Date: 2024-11-25 22:42
To: lustre-discuss
Subject: Issue with High-Load Write Operations in Lustre Cluster
Dear Lustre Community,
I am encountering an issue with the Lustre high-availability component, Corosync, which experiences packet loss under high load, triggering fencing and powering down Lustre nodes. I am seeking advice on how to resolve this issue. Below are the details of our environment and the problem:
Environment:
Lustre version: 2.15.5
Physical machines: 11 machines, each with 128 CPU cores and 376GB of memory.
Virtualization: Each physical machine runs a KVM virtual machine with 20 cores and 128GB of memory, using Rocky Linux 8.10.
Lustre setup: Each VM has 2 MDTs (512GB each) and 16 OSTs (670GB each).
Configuration (/etc/modprobe.d/lustre.conf):
options lnet networks="o2ib(enp0s5f0np0)"
options libcfs cpu_npartitions=2
options ost oss_num_threads=512
options mds mds_num_threads=512
options ofd adjust_blocks_percent=11

Network: 100GB RDMA network.
Clients: 11 clients using vdbench to perform large file writes (total write bandwidth approximately 50GB).
Issue:
Under high load write operations, the Corosync component experiences packet loss. There is a probability that heartbeat loss triggers Pacemaker's fencing mechanism, which powers down the Lustre nodes.
Analysis Conducted:
CPU usage: The CPU utilization is not very high, but the cpu load is very high (reaching around 400).
Packet loss: There is packet loss observed when pinging between Lustre nodes.
Tuning ost_num_threads and mds_num_threads: Reducing these values reduced the system load and improved packet loss significantly, but it also led to a decrease in the Vdbench write bandwidth.
Network tuning: After adjusting net.ipv4.udp_mem (three times larger than default), packet loss improved, but it still persists.
sysctl -w net.ipv4.udp_mem="9217055 12289407 18434106"

Assistance Requested:
I would appreciate any suggestions from the community on how to resolve this issue effectively. If anyone has faced similar challenges, your insights would be especially valuable.
Thank you for your time and assistance. I look forward to your responses.
Best regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20250303/07d866ba/attachment.htm>


More information about the lustre-discuss mailing list