[Lustre-discuss] MDT crash: ll_mdt at 100%

Mag Gam magawake at gmail.com
Fri Jul 3 08:02:34 PDT 2009


Exactly the symptoms I had. How long were you running this for?  Also,
how easy is it for you to reproduce this error?

This should clear up your doubts. But you said you are running at
1.6.7.1 which is bizzare because I was running at 1.6.7 . Maybe this
could be a different bug?

http://lists.lustre.org/pipermail/lustre-discuss/2009-April/010167.html





On Fri, Jul 3, 2009 at 10:44 AM, Thomas Roth<t.roth at gsi.de> wrote:
>
>
> Mag Gam wrote:
>> http://lists.lustre.org/pipermail/lustre-discuss/2009-March/009928.html
>>
>> Look familiar?
>>
> Yes, I've read the thread - that's why I addressed you in addition to
> the list  ;-)
>
> But I was not aware that this is supposed to be a bug in this particular
> Lustre version.
>
> Right now the MDT stops cooperating without any ll_mdt processes going
> up. Load is 0.5 or so on the MDT but no connections possible.
>  In the log I only noted some "still busy with 2 active RPCs" messages.
> I just hope I don't have to writeconf the MDT again - I learned on this
> list that this would be necessary if these RPCs are never finished.
>
> Regards,
> Thomas
>
>
>>
>> On Fri, Jul 3, 2009 at 7:32 AM, Thomas Roth<t.roth at gsi.de> wrote:
>>> Hi,
>>>
>>> I didn't take notice of a discussion of such problems with 1.6.7.1. Â Do
>>> you know something more specific about it? We won't want to downgrade
>>> since our users are happier after the last upgrade (1.6.5 -> 1.6.7). And
>>> we don't have the 1.6.7.2 (Debian-) packages yet. But I could try to
>>> speed that up and force an upgrade if you told me that 1.6.7.1 wasn't
>>> really reliable.
>>>
>>> For the moment the problem seems to have been fixed by shutdown,
>>> fs-check and writeconf of all servers.
>>> However, I don't want to do that every other week ...
>>>
>>> Thanks a lot for your help,
>>> Thomas
>>>
>>> Mag Gam wrote:
>>>> Hi Tom:
>>>>
>>>> There was a known issue with 1.6.7.1. What I did was downgrade to
>>>> 1.6.6 and everything worked well. Or you can try upgrading, but there
>>>> is something def wrong with that version...
>>>>
>>>> If you like, I can help you offline. I should be free this weekend (I
>>>> have a long weekend)
>>>>
>>>>
>>>>
>>>> On Thu, Jul 2, 2009 at 8:22 AM, Thomas Roth<t.roth at gsi.de> wrote:
>>>>> Hi all,
>>>>>
>>>>> our MDT gets stuck and unresponsive with very high loads (Lustre
>>>>> 1.6.7.1, Kernel 2.6.22, 8 Core, 32GB RAM). The only thing calling
>>>>> attention is one ll_mt_?? process running with 100% cpu. Nothing unusual
>>>>> happening on the cluster before that.
>>>>> After reboot as well as after moving the service to another server, this
>>>>> behavior reappears. The initial stages - mounting MGS, mouting MDT,
>>>>> recovery - work fine, but then the load goes up and the system is
>>>>> rendered unusable.
>>>>>
>>>>> Atm, I don't know what to do, except shutting down all servers and
>>>>> possible do a writeconf everywhere.
>>>>>
>>>>> I see that a similar problem was reported by Mag in March this year, but
>>>>> no clues or solutions appeared.
>>>>> Any ideas?
>>>>>
>>>>> Yours,
>>>>> Thomas
>>>>>
>>> --
>>> --------------------------------------------------------------------
>>> Thomas Roth
>>> Department: Informationstechnologie
>>> Location: SB3 1.262
>>> Phone: +49-6159-71 1453 Â Fax: +49-6159-71 2986
>>>
>>> GSI Helmholtzzentrum für Schwerionenforschung GmbH
>>> Planckstraße 1
>>> D-64291 Darmstadt
>>> www.gsi.de
>>>
>>> Gesellschaft mit beschränkter Haftung
>>> Sitz der Gesellschaft: Darmstadt
>>> Handelsregister: Amtsgericht Darmstadt, HRB 1528
>>>
>>> Geschäftsführer: Professor Dr. Horst Stöcker
>>>
>>> Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph,
>>> Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
>>>
>
> --
> --------------------------------------------------------------------
> Thomas Roth
> Department: Informationstechnologie
> Location: SB3 1.262
> Phone: +49-6159-71 1453  Fax: +49-6159-71 2986
>
> GSI Helmholtzzentrum für Schwerionenforschung GmbH
> Planckstraße 1
> D-64291 Darmstadt
> www.gsi.de
>
> Gesellschaft mit beschränkter Haftung
> Sitz der Gesellschaft: Darmstadt
> Handelsregister: Amtsgericht Darmstadt, HRB 1528
>
> Geschäftsführer: Professor Dr. Horst Stöcker
>
> Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph,
> Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
>



More information about the lustre-discuss mailing list