[lustre-discuss] ko2iblnd.conf

Andreas Dilger adilger at whamcloud.com
Fri Apr 12 13:57:10 PDT 2024


The ko2iblnd-opa settings are only used if you have Intel OPA instead of Mellanox cards (depends on the ko2iblnd-probe script).  You should still have ko2iblnd line in the server config that is used for MLX cards in order to set the values to match on both sides.

As for the actual settings, someone with more LNet IB experience should chime in on what is best to use.  All I know is that they have to be the same on both sides or they get unhappy, and the usable values depend on the card type and MOFED/OFED version.  As a starting point I would just copy the client ko2iblnd options to the server and see if it works.

Cheers, Andreas

On Apr 11, 2024, at 12:02, Daniel Szkola <dszkola at fnal.gov<mailto:dszkola at fnal.gov>> wrote:

On the server node(s):

options ko2iblnd-opa peer_credits=32 peer_credits_hiw=16 credits=1024 concurrent_sends=64 ntx=2048 map_on_demand=256 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4

On clients:

options ko2iblnd peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4

My concern isn’t so much the mismatch because I know that’s an issue but rather what numbers we should settle on with a recent lustre build. I also see the ko2iblnd-opa in the server config, which means because the server is actually loading ko2iblnd that maybe defaults are used?

What made me look was we were seeing lots of:
LNetError: 2961324:0:(o2iblnd_cb.c:2612:kiblnd_passive_connect()) Can't accept conn from xxx.xxx.xxx.xxx at o2ib2, queue depth too large:  42 (<=32 wanted)

—
Dan Szkola
FNAL


On Apr 11, 2024, at 12:36 PM, Andreas Dilger <adilger at whamcloud.com<mailto:adilger at whamcloud.com>> wrote:

[EXTERNAL] – This message is from an external sender


On Apr 11, 2024, at 09:56, Daniel Szkola via lustre-discuss <lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>> wrote:

Hello all,

I recently discovered some mismatches in our /etc/modprobe.d/ko2iblnd.conf files between our clients and servers.

Is it now recommended to keep the defaults on this module and run without a config file or are there recommended numbers for lustre-2.15.X?

The only thing I’ve seen that provides any guidance is the Lustre wiki and an HP/Cray doc:

https://www.hpe.com/psnow/resources/ebooks/a00113867en_us_v2/Lustre_Server_Recommended_Tuning_Parameters_4.x.html

Anyone have any sage advice on what the ko2iblnd.conf (and possibly ko2iblnd-opa.conf and hfi1.conf as well) on modern systems?

It would be useful to know what specific settings are mismatched.  Definitely some of them need to be consistent between peers, others depend on your system.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud









Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240412/fcee907d/attachment.htm>


More information about the lustre-discuss mailing list