[Lustre-discuss] E2fsck running for a week so far...

Andrus, Brian Contractor bdandrus at nps.edu
Mon May 26 12:36:42 PDT 2014


It is more of a cautious thing. The MDS/MGS kernel panicked a few times in as many days. The first couple were under heavy load due to a user. But when I was bringing it back up, I ran e2fsk on all the targets and had some corruption that was fixed. But then the MGS/MDS kernel panicked as soon as I mounted the MGT and MDT. Hadn't even mounted any OSTs.
So to be careful, I have the filesystem offline and started running the e2fsck --mdsdb on the MDT 
It is writing to local disk, so the slowness shouldn't be due to that. It's even an SSD.
It is pretty confusing that it is taking so long tho. I see one CPU that is pretty much pegged at >90% and the mdsdb file does grow, albeit very slowly (like 6 hours before a few bytes are written to it).

Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238



-----Original Message-----
From: Dilger, Andreas [mailto:andreas.dilger at intel.com] 
Sent: Monday, May 26, 2014 12:15 PM
To: Andrus, Brian Contractor
Cc: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] E2fsck running for a week so far...

On 2014/05/26, 9:03 AM, "Andrus, Brian Contractor" <bdandrus at nps.edu<mailto:bdandrus at nps.edu>> wrote:

Is it normal for e2fsck running on an MDT with --msdb to take over a week?
The entire MDT is only 500GB.

This is limited by the performance of the database that e2fsck is using for the mdsdb.  If this is stored on e.g. NFS, and the database is large, then it will slow to a crawl.

Typically I don't recommend users to run the old lfsck unless there is a huge amount of corruption that needs to be fixed.  Most of the problems it fixes can also be fixed in a different manner.

What problem are you having?

Cheers, Andreas

So far it has only output:


e2fsck 1.42.7.wc2 (07-Nov-2013)
WORK=MDT0000 lustre database creation, check forced.
Pass 1: Checking inodes, blocks, and sizes
MDS: ost_idx 0 max_id 6351370
MDS: ost_idx 1 max_id 5766664
MDS: ost_idx 2 max_id 5821326
MDS: ost_idx 3 max_id 5720490
MDS: ost_idx 4 max_id 2889092
MDS: ost_idx 5 max_id 2654116
MDS: ost_idx 6 max_id 2805220
MDS: ost_idx 7 max_id 2895847
MDS: ost_idx 8 max_id 2932156
MDS: ost_idx 9 max_id 2777382
MDS: ost_idx 10 max_id 2764932
MDS: ost_idx 11 max_id 2655203
MDS: ost_idx 12 max_id 2742542
MDS: ost_idx 13 max_id 2856457
MDS: got 112 bytes = 14 entries in lov_objids
MDS: max_files = 32837426
MDS: num_osts = 14
mds info db file written
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information Pass 6: Acquiring MDT information for lfsck


Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238




Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division



More information about the lustre-discuss mailing list