[lustre-discuss] ko2iblnd.conf

Daniel Szkola dszkola at fnal.gov
Thu Apr 11 11:02:01 PDT 2024


On the server node(s):

options ko2iblnd-opa peer_credits=32 peer_credits_hiw=16 credits=1024 concurrent_sends=64 ntx=2048 map_on_demand=256 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4

On clients:

options ko2iblnd peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4

My concern isn’t so much the mismatch because I know that’s an issue but rather what numbers we should settle on with a recent lustre build. I also see the ko2iblnd-opa in the server config, which means because the server is actually loading ko2iblnd that maybe defaults are used?

What made me look was we were seeing lots of:
LNetError: 2961324:0:(o2iblnd_cb.c:2612:kiblnd_passive_connect()) Can't accept conn from xxx.xxx.xxx.xxx at o2ib2, queue depth too large:  42 (<=32 wanted)

—
Dan Szkola
FNAL


> On Apr 11, 2024, at 12:36 PM, Andreas Dilger <adilger at whamcloud.com> wrote:
> 
> [EXTERNAL] – This message is from an external sender
> 
> 
> On Apr 11, 2024, at 09:56, Daniel Szkola via lustre-discuss <lustre-discuss at lists.lustre.org> wrote:
>> 
>> Hello all,
>> 
>> I recently discovered some mismatches in our /etc/modprobe.d/ko2iblnd.conf files between our clients and servers.
>> 
>> Is it now recommended to keep the defaults on this module and run without a config file or are there recommended numbers for lustre-2.15.X?
>> 
>> The only thing I’ve seen that provides any guidance is the Lustre wiki and an HP/Cray doc:
>> 
>> https://www.hpe.com/psnow/resources/ebooks/a00113867en_us_v2/Lustre_Server_Recommended_Tuning_Parameters_4.x.html
>> 
>> Anyone have any sage advice on what the ko2iblnd.conf (and possibly ko2iblnd-opa.conf and hfi1.conf as well) on modern systems?
> 
> It would be useful to know what specific settings are mismatched.  Definitely some of them need to be consistent between peers, others depend on your system.
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Whamcloud
> 
> 
> 
> 
> 
> 
> 



More information about the lustre-discuss mailing list