[lustre-discuss] omnipath and lnet_selftest performance
Michael DiDomenico
mdidomenico4 at gmail.com
Fri Jul 5 10:37:56 PDT 2024
i could use a little help with lustre clients over omni path. when i
run ib_write_bw tests between two compute nodes i get +10GB/sec.
compute nodes are rhel9.4 with rhel hw drivers
however, when i run lnet_selftest between the same two compute nodes
1m i/o size
16 concurrency
node1-node3
read 1m i/o ~7.1GB/sec
write 1m i/o ~4.7GB/sec
node3-node1
read 1m i/o ~6.6GB/sec
write 1m i/o ~4.9GB/sec
varying the i/o size and concurrency changes the numbers, but not
dramatically. i've gone through the tuning guide for omnipath and my
lnd tunables all match, but i can't seem to drive the bandwidth any
higher between nodes.
can anyone suggest where i might be dropping some performance or is
this the end? i feel like there should be more performance here, but
since we recently retooled from rhel7 to rhel9, i'm unsure if there's
a tunable not tuned. (unfortunately i don't have/can't seem to find
previous numbers to compare)
More information about the lustre-discuss
mailing list