[Lustre-discuss] lfsck_start

Andrus, Brian Contractor bdandrus at nps.edu
Mon Jan 12 12:02:16 PST 2015


Rick,

Thanks for the info, that seems to have helped with the quota issue.
That lfsck_start has gotten worse. I was able to do 'lctl lfsck_start -M <device> -t namespace' but immediately after hitting enter, EVERY MGS/MDS/OSS kernel panicked....!!!????

I have brought the system back up but now I have one OST that kernel panics the OSS withing 20-30 seconds of it getting mounted. It passes all the e2fsck checks so I am not sure where to look on that.  For now I am bringing things up one OST at a time and hoping for the best...


Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238




-----Original Message-----
From: Mohr Jr, Richard Frank (Rick Mohr) [mailto:rmohr at utk.edu] 
Sent: Monday, January 12, 2015 11:13 AM
To: Andrus, Brian Contractor
Cc: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] lfsck_start

Brian,

With the newer versions of Lustre (starting with 2.4 I believe), quotas are automatically enabled for space accounting purposes even if you don't apply quotas to any user accounts.  (Enabling/disabling quota enforcement is handled by running "lfs quota on|off".)

We recently had similar errors on several of our OSTs following a failure of one of our storage controllers.  We did not find any indication that anything else on the file system was inconsistent, so I am still not sure why the quotas would have been affected.  Nevertheless, we decided to just regenerate the quota info to be safe.  The manual doesn't clearly state how to regenerate the info, but when we upgraded to 2.4, we had to run "tunefs.lustre --quota $DEVICE" against the MDT/OSTs.  The manual seems to indicate that this will set the QUOTA flag in the superblock and then invoke e2fsck to generate the disk usage database.  However, when we tried running this, it did not seem to regenerate the quota info.  I am not sure if this was supposed to work or not.  But in any case, we found a Lustre ticket that recommended these commands which ended up working for us:

tune2fs -O ^quota $DEVICE
tune2fs -O quota $DEVICE

The first command truncates the quota databases and the second one forces the info to be recalculated. (On several of our OSTs, we noticed that e2fsck reported errors about some inodes in addition to the quota warnings.  But when we regenerated the quotas, the quota warnings and the errors about the inodes both went away.)

As far as running lfsck goes, we ran a namespace check after our 1.8->2.4 upgrade using this command:

  lctl lfsck_start -M medusa-MDT0000 -t namespace

The OI scrub happened automatically.   I know that Lustre 2.6 introduced some more options for lfsck_start so YMMV.

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences http://www.nics.tennessee.edu


On Jan 12, 2015, at 12:56 PM, "Andrus, Brian Contractor" <bdandrus at nps.edu> wrote:

> All,
>  
> I am running the latest lustre rpms (lustre-2.6.0-2.6.32_431.20.3.el6_lustre.x86_64.x86_64).
> One of our OSTs is throwing errors whenever e2fsck is run:
> 
> [ERROR] quotaio_tree.c:590:check_reference:: Illegal reference (2767 >= 54) in user quota file. Quota file is probably corrupted.
> Please run e2fsck (8) to fix it.
> [ERROR] quotaio_tree.c:590:check_reference:: Illegal reference (1429438464 >= 54) in user quota file. Quota file is probably corrupted.
> Please run e2fsck (8) to fix it.
> [QUOTA WARNING] Usage inconsistent for ID 0:actual (482955264, 8813504) != expected (0, 0)
> [ERROR] quotaio_tree.c:590:check_reference:: Illegal reference (14001 >= 54) in user quota file. Quota file is probably corrupted.
> Please run e2fsck (8) to fix it.
> [ERROR] quotaio_tree.c:590:check_reference:: Illegal reference (103 >= 54) in user quota file. Quota file is probably corrupted.
> Please run e2fsck (8) to fix it.
> [ERROR] quotaio_tree.c:590:check_reference:: Illegal reference (2064384 >= 54) in user quota file. Quota file is probably corrupted.
> Please run e2fsck (8) to fix it.
> [ERROR] quotaio_tree.c:590:check_reference:: Illegal reference (17508 >= 54) in user quota file. Quota file is probably corrupted.
> Please run e2fsck (8) to fix it.
> [QUOTA WARNING] Usage inconsistent for ID 0:actual (482955264, 8813504) != expected (137438953472, 0)
> [QUOTA WARNING] Usage inconsistent for ID 0:actual (482955264, 8813504) != expected (0, 0)
> [QUOTA WARNING] Usage inconsistent for ID 0:actual (482955264, 8813504) != expected (0, 0)
> [ERROR] quotaio_tree.c:590:check_reference:: Illegal reference (12288 >= 54) in user quota file. Quota file is probably corrupted.
> Please run e2fsck (8) to fix it.
> [ERROR] quotaio_tree.c:590:check_reference:: Illegal reference (19053 >= 54) in user quota file. Quota file is probably corrupted.
> Please run e2fsck (8) to fix it.
> [ERROR] quotaio_tree.c:590:check_reference:: Illegal reference (10087 >= 54) in user quota file. Quota file is probably corrupted.
> Please run e2fsck (8) to fix it.
> [ERROR] quotaio_tree.c:590:check_reference:: Illegal reference (4218437632 >= 54) in user quota file. Quota file is probably corrupted.
> Please run e2fsck (8) to fix it.
> Signal (11) SIGSEGV si_code=SEGV_MAPERR fault addr=0x1ff91020
> We are not running quotas on the filesystem.
>  
> I thought it may be a good idea to run lfsck on the system, but things have changed and now I get a message to use lfsck_start.
> Now, I am having great difficulty finding documentation on running it although I do see quite a few code bits and bugs about it.
> All I find is the standard help message put in the documentation, but nothing that goes into any detail about any of the options. It also seems no matter what options I try, I get:
>  
> Fail to start LFSCK: Operation not supported
>  
> 1)  What could be causing such an error on an OST
> 2) Are there any detailed descriptions/examples of running lfsck_start for the 2.6 code?
>  
> Brian Andrus
> ITACS/Research Computing
> Naval Postgraduate School
> Monterey, California
> voice: 831-656-6238
>  
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss





More information about the lustre-discuss mailing list