[lustre-discuss] Lustre RoCE v2 traffic issue

VICTOR MANUEL MINJARES NERIZ victor.minjares at unison.mx
Wed Apr 29 14:52:26 PDT 2026


Hi everyone,

I hope this message find you well.

I am unable to get Lustre RoCE v2 traffic to carry specific DSCP/TOS tags. While synthetic tests (ib_send_bw) successfully hit the desired hardware priority queues by selecting a specific GID index, Lustre traffic remains stuck at tos 0x1 (ECN enabled, DSCP 0), causing it to be mapped to the default Unicast queue (UC0) rather than the Lossless queue (UC3) on our SONiC switches.

System environment:

OS: Rocky linux 9.6 (5.14.0-570.17.1.el9_6.x86_64)

Lustre: 2.15.7

NIC:
driver: bnxt_en
version: 1.10.3-233.0.198.0
firmware-version: 233.0.196.0/pkg 23.31.18.10
expansion-rom-version:
bus-info: 0000:21:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

Switch:
Software Version  : 4.5.0a-Enterprise_Premium
Product           : Enterprise SONiC Distribution by Dell Technologies
Distribution      : Debian 11.11
Kernel            : 5.10.0-21-amd64
Config DB Version : version_4_5_2

lnet:
net:
   - net type: lo
     local NI(s):
- nid: 0 at lo
 status: up
 statistics:
     send_count: 0
     recv_count: 0
     drop_count: 0
 tunables:
     peer_timeout: 0
     peer_credits: 0
     peer_buffer_credits: 0
     credits: 0
 lnd tunables:
 dev cpt: 0
 CPT: "[0,1,2,3,4,5,6,7]"
   - net type: o2ib1
     local NI(s):
- nid: 172.16.7.13 at o2ib1
 status: up
 interfaces:
     0: ens1f1np1
 statistics:
     send_count: 2314
     recv_count: 4361
     drop_count: 0
 tunables:
     peer_timeout: 180
     peer_credits: 128
     peer_buffer_credits: 0
     credits: 1024
 lnd tunables:
     peercredits_hiw: 64
     map_on_demand: 1
     concurrent_sends: 128
     fmr_pool_size: 512
     fmr_flush_trigger: 384
     fmr_cache: 1
     ntx: 512
     conns_per_peer: 1
 dev cpt: 2
 CPT: "[0,1,2,3,4,5,6,7]"

Troubleshooting Steps Already Taken

Manual TOS Overwrite: Attempted cma_roce_tos -d bnxt_re1 -t 104. Command returns successfully, but tcpdump confirms outgoing Lustre packets still carry tos 0x1.

Kernel Mangle Bypass: Applied nftables (mangle table) rules to force DSCP 26 on UDP port 4791. Traffic remains 0x1, suggesting hardware offload bypasses the Linux network stack.

Synthetic Success: Using ib_send_bw -x 3 (selecting GID Index 3) successfully changes the hardware queue and tagging. This proves the hardware is capable, but the Lustre kernel module isn't utilizing the correct GID index or TOS.

Through all the investigation, I think Lustre (LNet and/or ko2iblnd) is not tagging the packets correctly, and I cannot find how to set it to use ToS 0x69. On the switch, it is still using UC0. I think the problem is with Lustre because if I use the ib_send_bw -x 3 command, it does go through UC3.

I would appreciate it if someone could give me some guidance to solve this problem.

Thank you in advance.

Warm regards,
Victor

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20260429/a860d7fc/attachment.htm>


More information about the lustre-discuss mailing list