[Lustre-discuss] MDT crash: ll_mdt at 100%

Mag Gam magawake at gmail.com
Thu Jul 2 21:03:28 PDT 2009


Hi Tom:

There was a known issue with 1.6.7.1. What I did was downgrade to
1.6.6 and everything worked well. Or you can try upgrading, but there
is something def wrong with that version...

If you like, I can help you offline. I should be free this weekend (I
have a long weekend)



On Thu, Jul 2, 2009 at 8:22 AM, Thomas Roth<t.roth at gsi.de> wrote:
> Hi all,
>
> our MDT gets stuck and unresponsive with very high loads (Lustre
> 1.6.7.1, Kernel 2.6.22, 8 Core, 32GB RAM). The only thing calling
> attention is one ll_mt_?? process running with 100% cpu. Nothing unusual
> happening on the cluster before that.
> After reboot as well as after moving the service to another server, this
> behavior reappears. The initial stages - mounting MGS, mouting MDT,
> recovery - work fine, but then the load goes up and the system is
> rendered unusable.
>
> Atm, I don't know what to do, except shutting down all servers and
> possible do a writeconf everywhere.
>
> I see that a similar problem was reported by Mag in March this year, but
> no clues or solutions appeared.
> Any ideas?
>
> Yours,
> Thomas
>



More information about the lustre-discuss mailing list