[Lustre-discuss] quotacheck blows up MDT

Thomas Roth t.roth at gsi.de
Fri Apr 24 08:27:56 PDT 2009


No, in all cases - 2 days ago and today's 2 attempts - I made sure there
were no client mounts except for my one "administration" host. Checked
all exports on the MDS, logged in to potential clients, umounted and
removed Lustre modules, informed the users - and our batch farm was down
anyhow, so the number of potential clients was rather small.

Thomas

Nathan.Dauchy at noaa.gov wrote:
> Thomas,
> 
> Were there other clients mounted when you ran "lfs quotacheck"?  Could
> there have been a difference in client activity between your two attempts?
> 
> -Nathan
> 
> ----- Original Message -----
> From: Thomas Roth <t.roth at gsi.de>
> Date: Friday, April 24, 2009 7:28 am
> Subject: [Lustre-discuss] quotacheck blows up MDT
> 
>> Hi all,
>>
>> in a recent shutdown of our Lustre cluster (net reconfig, Version
>> upgrade to 1.6.7_patched), I decided to try to switch on quotas - this
>> had failed when the cluster went operational last year.
>>
>> Again, I suffered from the same error as last year - failure, and
>> "device/resource busy". This time, I was sure there was no activity at
>> all on the system. But on the MDS, I observed a steep increase of the
>> machine load, up to values of 70. The machine reacted very slowly. It
>> is, however, an 8 Core Xeon - 32 GB RAM - Raptor disk - server, and in
>> normal operation, this machine did never show any sign of overloading,
>> no matter what our users do.
>> Nevertheless, the Lustre log complained about connection losses to 
>> someOSTs (at least one was set incative), Heartbeat, which controls 
>> the IP
>> of the MGS, complained about timeouts, and so did DRBD, which mirrors
>> the MGS and MDT disks to a slave server. Probably the machine simply
>> lost its own eth0/1/2/3/4 network interfaces which are used by these
>> services.
>>
>> After 30 min, the "lfs quotacheck -ug /lustre" command aborted with 
>> thesaid errors. This happened again today, when we gave it another 
>> try.This time, we umounted Lustre, of course removed all Lustre 
>> modules,mounted it again and repeated the quotacheck. Similar 
>> behavior on the
>> MDS, but this time the command ran through,  the services recovered,
>> Lustre survived and was mountable and - the quotas seem to work.
>>
>> So, after this lengthy intro, my question: Is this extreme loading or
>> overloading of the MDS during quotacheck a "normal" feature?
>>
>> Is there a connection to the fact that the filesystem is already 75%
>> full, with 128 TB?
>>
>> We have 68 OSTs, half of them 2.3TB, half of them 2.7 TB .
>> All servers run Debian Etch 64, Kernel 2.6.22.
>>
>> Regards,
>> Thomas
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

-- 
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
D-64291 Darmstadt
www.gsi.de

Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528

Geschäftsführer: Professor Dr. Horst Stöcker

Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph,
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt



More information about the lustre-discuss mailing list