[lustre-discuss] High MDS load, but no activity

Kevin M. Hildebrand kevin at umd.edu
Thu Jul 27 05:18:04 PDT 2017


We recently updated to Lustre 2.8 on our cluster, and have started seeing
some unusal load issues.
Last night our MDS load climbed to well over 100, and client performance
dropped to almost zero.
Initially this appeared to be related to a number of jobs that were doing
large numbers of opens/closes, but even after killing those jobs, the MDS
load did not recover.

Looking at stats in /proc/fs/lustre/mdt/scratch-MDT0000/exports showed
little to no activity on the MDS.  Looking at iostat showed almost no disk
activity to the MDT (or to any device, for that matter), and minimal IO
wait.
Memory usage (the machine has 128GB) showed over half of that memory free.

I eventually ended up unmounting the MDT and failing it over to a backup
MDS, which promptly recovered and now has a load of near zero.

Has anyone seen this before?  Any suggestions for what I should look at if
this happens again?

Thanks!
Kevin

--
Kevin Hildebrand
University of Maryland, College Park
Division of IT
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20170727/05f8f8a5/attachment.htm>


More information about the lustre-discuss mailing list