[lustre-discuss] High MDS load

Peeples, Heath heathp at HPC.MsState.Edu
Thu May 28 08:37:03 PDT 2020

I have 2 MDSs and periodically on one of them (either at one time or another) peak above 300, causing the file system to basically stop.  This lasts for a few minutes and then goes away.  We can't identify any one user running jobs at the times we see this, so it's hard to pinpoint this on a user doing something to cause it.   Could anyone point me in the direction of how to begin debugging this?  Any help is greatly appreciated.

