What is the underlying disk, did that hardware/RAID config change <div>when you switched hardware? </div><div>The 'still busy' message is a bug, may be fixed in 1.8.5</div><div>cliffw</div><div><br><br><div class="gmail_quote">

On Sat, Apr 2, 2011 at 1:01 AM, Thomas Roth <span dir="ltr"><<a href="mailto:t.roth@gsi.de">t.roth@gsi.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

Hi all,<br>

<br>

we are suffering from a sever metadata performance degradation on our 1.8.4 cluster and are pretty clueless.<br>

- We moved the MDT to a new hardware, since the old one was failing<br>

- We increased the size of the MDT with 'resize2fs' (+ mounted it and saw all the files)<br>

- We found the performance of the new mds dreadful<br>

- We restarted the MDT on the old hardware with the failed RAID controller replaced, but without doing anything with OSS or clients<br>

The machine crashed three minutes after recovery was over<br>

- Moved back to the new hardware, but the system was now pretty messed up: persistent  "still busy with N RPCs" and some "going back to sleep"<br>

messages (by the way, there is no way to find out what these RPCs are, and how to kill them? Of course I wouldn't mind switching off some clients or<br>

even rebooting some OSS if I only new which ones...)<br>

- Shut down the entire cluster, writeconf, restart without any client mounts - worked fine<br>

- Mounted Lustre and tried to "ls" a directory with 100 files:   takes several minutes(!)<br>

- Being patient and then trying the same on a second client:     takes msecs.<br>

<br>

I have done complete shutdowns before, lastly to upgrade from 1.6 to 1.8, then without writeconf and without performance loss. Before to change the<br>

IPs of all servers (moving into a subnet), with writeconf, but without recollection of the metadata behavior afterwards.<br>

It is clear that after writeconf some information has to be regenerated, but this is really extreme - also normal?<br>

<br>

The MDT now behaves more like an xrootd master which makes first contact to its file servers and has to read in the entire database (would be nice to<br>

have in Lustre to regenerate the MDT in case of desaster ;-) ).<br>

Which caches are being filled now when I ls through the cluster? May I expect the MDT to explode once it has learned about a certain percentage of the<br>

system? ;-) I mean, we have 100 mio files now and the current MDT hardware has just 32GB memory...<br>

In any case this is not the Lustre behavior we are used to.<br>

<br>

Thanks for any hints,<br>

Thomas<br>

<br>

_______________________________________________<br>

Lustre-discuss mailing list<br>

<a href="mailto:Lustre-discuss@lists.lustre.org">Lustre-discuss@lists.lustre.org</a><br>

<a href="http://lists.lustre.org/mailman/listinfo/lustre-discuss" target="_blank">http://lists.lustre.org/mailman/listinfo/lustre-discuss</a><br>

</blockquote></div><br><br clear="all"><br>-- <br>cliffw<div>Support Guy</div><div>WhamCloud, Inc. </div><div><a href="http://www.whamcloud.com" target="_blank">www.whamcloud.com</a></div><div><br></div><br>

</div>