[Lustre-discuss] MDT extremely slow after restart

Thomas Roth t.roth at gsi.de
Mon Apr 4 00:07:12 PDT 2011


Hi Cliff,

no, the configuration as such did not change. The hardware is quite  different, though. The old box
had a Raid10 on 16 built-in Raptor 150GB SATA disks, the new one a Raid10 on 24 Cheetah 300GB SAS disks in an
fibre-channel attached external enclosure.
Actually we are more concerned that the 48 AMD cores of the new box might not  have been the best idea.

But atm, the system is running fine and fast again!
After the last MDT restart, I started several ls-jobs crawling through the entire clustre. Obviously, after writeconfing all servers, Lustre really
has to "learn" again about the whereabouts of its files. And I found experimentally that it is about the knownlegde of the OSTs: in fact we have tried
our very old, repaired hardware as MDT while copying the MDT to yet another, third type of machine. The effect of "first very slow, then very fast
'ls'" was there. Then we shut down and started the third hardware, tried on new directories - same effect, tried on some already checked directories -
very fast. So using the old hardware had refreshed the memory of the OSTs about these directories.

All of this is to be expected to some degree, but the difference of minutes vs. mseconds is quite astonishing.

Ah well,  this cluster is also full to the brim, and the last time we had to writeconf the servers, there were certainly 20%-30% less files.

Cheers,
Thomas

On 04/04/2011 06:11 AM, Cliff White wrote:
> What is the underlying disk, did that hardware/RAID config change 
> when you switched hardware? 
> The 'still busy' message is a bug, may be fixed in 1.8.5
> cliffw
> 
> 
> On Sat, Apr 2, 2011 at 1:01 AM, Thomas Roth <t.roth at gsi.de <mailto:t.roth at gsi.de>> wrote:
> 
>     Hi all,
> 
>     we are suffering from a sever metadata performance degradation on our 1.8.4 cluster and are pretty clueless.
>     - We moved the MDT to a new hardware, since the old one was failing
>     - We increased the size of the MDT with 'resize2fs' (+ mounted it and saw all the files)
>     - We found the performance of the new mds dreadful
>     - We restarted the MDT on the old hardware with the failed RAID controller replaced, but without doing anything with OSS or clients
>     The machine crashed three minutes after recovery was over
>     - Moved back to the new hardware, but the system was now pretty messed up: persistent  "still busy with N RPCs" and some "going back to sleep"
>     messages (by the way, there is no way to find out what these RPCs are, and how to kill them? Of course I wouldn't mind switching off some clients or
>     even rebooting some OSS if I only new which ones...)
>     - Shut down the entire cluster, writeconf, restart without any client mounts - worked fine
>     - Mounted Lustre and tried to "ls" a directory with 100 files:   takes several minutes(!)
>     - Being patient and then trying the same on a second client:     takes msecs.
> 
>     I have done complete shutdowns before, lastly to upgrade from 1.6 to 1.8, then without writeconf and without performance loss. Before to change the
>     IPs of all servers (moving into a subnet), with writeconf, but without recollection of the metadata behavior afterwards.
>     It is clear that after writeconf some information has to be regenerated, but this is really extreme - also normal?
> 
>     The MDT now behaves more like an xrootd master which makes first contact to its file servers and has to read in the entire database (would be nice to
>     have in Lustre to regenerate the MDT in case of desaster ;-) ).
>     Which caches are being filled now when I ls through the cluster? May I expect the MDT to explode once it has learned about a certain percentage of the
>     system? ;-) I mean, we have 100 mio files now and the current MDT hardware has just 32GB memory...
>     In any case this is not the Lustre behavior we are used to.
> 
>     Thanks for any hints,
>     Thomas
> 
>     _______________________________________________
>     Lustre-discuss mailing list
>     Lustre-discuss at lists.lustre.org <mailto:Lustre-discuss at lists.lustre.org>
>     http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> 
> 
> 
> -- 
> cliffw
> Support Guy
> WhamCloud, Inc. 
> www.whamcloud.com <http://www.whamcloud.com>
> 
> 

-- 
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
64291 Darmstadt
www.gsi.de

Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528

Geschäftsführung: Professor Dr. Dr. h.c. Horst Stöcker,
Dr. Hartmut Eickhoff

Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt



More information about the lustre-discuss mailing list