[lustre-discuss] omnipath and lnet_selftest performance

Michael DiDomenico mdidomenico4 at gmail.com
Fri Jul 5 10:37:56 PDT 2024


i could use a little help with lustre clients over omni path.  when i
run ib_write_bw tests between two compute nodes i get +10GB/sec.
compute nodes are rhel9.4 with rhel hw drivers

however, when i run lnet_selftest between the same two compute nodes

1m i/o size
16 concurrency

node1-node3
read 1m i/o ~7.1GB/sec
write 1m i/o ~4.7GB/sec

node3-node1
read 1m i/o ~6.6GB/sec
write 1m i/o ~4.9GB/sec

varying the i/o size and concurrency changes the numbers, but not
dramatically.  i've gone through the tuning guide for omnipath and my
lnd tunables all match, but i can't seem to drive the bandwidth any
higher between nodes.

can anyone suggest where i might be dropping some performance or is
this the end?  i feel like there should be more performance here, but
since we recently retooled from rhel7 to rhel9, i'm unsure if there's
a tunable not tuned.  (unfortunately i don't have/can't seem to find
previous numbers to compare)


More information about the lustre-discuss mailing list