[lustre-discuss] ldlm.lock_limit_mb sizing

Cameron Harr harr1 at llnl.gov
Wed Jul 17 12:58:05 PDT 2024


In 2017, Oleg gave a talk at ORNL's Lustre conference about LDLM, 
including references to ldlm.lock_limit _mb and 
ldlm.lock_reclaim_threshold_mb. 
https://lustre.ornl.gov/ecosystem-2017/documents/Day-2_Tutorial-4_Drokin.pdf

The apparent defaults back then in Lustre 2.8 for those two parameters 
were 30MB and 20MB, respectively.  On my 2.15 servers with 256GB and no 
changes from us, I'm seeing numbers of 77244MB and 51496MB, 
respectively. We recently got ourselves into a situation where a subset 
of MDTs appeared to be entirely overwhelmed trying to cancel locks, with 
~500K locks in the request queue but a request wait time of 6000 
seconds. So, we're looking  at potentially limiting the locks on the 
servers.

What's the formula for appropriately sizing ldlm.lock_limit _mb and 
ldlm.lock_reclaim_threshold_mb in 2.15 (I don't think node memory 
amounts have increased 20000X in 7 years)?

Thanks!

Cameron Harr



More information about the lustre-discuss mailing list