[lustre-discuss] 2.16.1 ptlrpcd infinite loop when machine runs out of RAM
Laura Hild
lsh at jlab.org
Wed Feb 5 07:21:10 PST 2025
I wanna say 2.15 added those messages (the obd_memory ones, not the spinning ptlrpcd) to every OoM. I remember seeing them when we first had 2.15 clients and looking them up. I take it you're not getting a corresponding OoM for each, though?
It is typical for a host to struggle if OoM conditions are happening regularly. Is there workload manager where you could contain individual jobs' memory usage, and limit the total to something with a bigger margin for the system?
More information about the lustre-discuss
mailing list