[lustre-discuss] High MDS load, but no activity

Thu Jul 27 10:38:19 PDT 2017

Hi Kevin,

On Thu, Jul 27, 2017 at 08:18:04AM -0400, Kevin M. Hildebrand wrote:
>We recently updated to Lustre 2.8 on our cluster, and have started seeing
>some unusal load issues.
>Last night our MDS load climbed to well over 100, and client performance
>dropped to almost zero.
>Initially this appeared to be related to a number of jobs that were doing
>large numbers of opens/closes, but even after killing those jobs, the MDS
>load did not recover.
>
>Looking at stats in /proc/fs/lustre/mdt/scratch-MDT0000/exports showed
>little to no activity on the MDS.  Looking at iostat showed almost no disk
>activity to the MDT (or to any device, for that matter), and minimal IO wait.
>Memory usage (the machine has 128GB) showed over half of that memory free.

sounds like VM spinning to me. check /proc/zoneinfo, /proc/vmstat etc.

do you have zone_reclaim_mode=0? that's an olde, but important to have
set to zero.
 sysctl vm.zone_reclaim_mode

failing that (and assuming you have a 2 or more numa zone server) I
would guess it's all the zone affinity stuff in lustre these days.
you can turn most of it off with a modprobe option
  options libcfs cpu_npartitions=1

what happens by default is that a bunch of lustre threads are bound to
numa zones and preferentially and agressively allocate kernel ram in
those zones. in practice this usually means that the zone where IB card
is physically attached fills up, and then the machine is (essentially)
out of ram and spinning hard trying to reclaim, even though all the ram
in the other zone(s) is almost all unused.

I tried to talk folks out of having affinity on by default in
  https://jira.hpdd.intel.com/browse/LU-5050
but didn't succeed.

even if it wasn't unstable to have affinity on, IMHO having 2x the ram
available for caching on the MDS and OSS's is #1, and tiny performance
increases from having that ram next to the IB card is a distant #2.

cheers,
robin

>I eventually ended up unmounting the MDT and failing it over to a backup
>MDS, which promptly recovered and now has a load of near zero.
>
>Has anyone seen this before?  Any suggestions for what I should look at if
>this happens again?
>
>Thanks!
>Kevin
>
>--
>Kevin Hildebrand
>University of Maryland, College Park
>Division of IT

>_______________________________________________
>lustre-discuss mailing list
>lustre-discuss at lists.lustre.org
>http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org