[Lustre-discuss] quotacheck blows up MDT

Fri Apr 24 06:38:23 PDT 2009

Thomas,

Were there other clients mounted when you ran "lfs quotacheck"?  Could
there have been a difference in client activity between your two attempts?

-Nathan

----- Original Message -----
From: Thomas Roth <t.roth at gsi.de>
Date: Friday, April 24, 2009 7:28 am
Subject: [Lustre-discuss] quotacheck blows up MDT

> Hi all,
> 
> in a recent shutdown of our Lustre cluster (net reconfig, Version
> upgrade to 1.6.7_patched), I decided to try to switch on quotas - this
> had failed when the cluster went operational last year.
> 
> Again, I suffered from the same error as last year - failure, and
> "device/resource busy". This time, I was sure there was no activity at
> all on the system. But on the MDS, I observed a steep increase of the
> machine load, up to values of 70. The machine reacted very slowly. It
> is, however, an 8 Core Xeon - 32 GB RAM - Raptor disk - server, and in
> normal operation, this machine did never show any sign of overloading,
> no matter what our users do.
> Nevertheless, the Lustre log complained about connection losses to 
> someOSTs (at least one was set incative), Heartbeat, which controls 
> the IP
> of the MGS, complained about timeouts, and so did DRBD, which mirrors
> the MGS and MDT disks to a slave server. Probably the machine simply
> lost its own eth0/1/2/3/4 network interfaces which are used by these
> services.
> 
> After 30 min, the "lfs quotacheck -ug /lustre" command aborted with 
> thesaid errors. This happened again today, when we gave it another 
> try.This time, we umounted Lustre, of course removed all Lustre 
> modules,mounted it again and repeated the quotacheck. Similar 
> behavior on the
> MDS, but this time the command ran through,  the services recovered,
> Lustre survived and was mountable and - the quotas seem to work.
> 
> So, after this lengthy intro, my question: Is this extreme loading or
> overloading of the MDS during quotacheck a "normal" feature?
> 
> Is there a connection to the fact that the filesystem is already 75%
> full, with 128 TB?
> 
> We have 68 OSTs, half of them 2.3TB, half of them 2.7 TB .
> All servers run Debian Etch 64, Kernel 2.6.22.
> 
> Regards,
> Thomas
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>