[Lustre-discuss] fsck.ext4 for device ... exited with signal 11.

Craig Prescott prescott at hpc.ufl.edu
Fri Dec 3 07:06:04 PST 2010


Andreas Dilger wrote:
> On 2010-12-02, at 09:24, Craig Prescott wrote:
>> fsck seems to be spending a lot of time in Pass1D, cloning 
>> multiply-claimed blocks.  But there is no output from fsck in many hours 
>> now,
> 
> Pass 1b-1d have O(n^2) complexity, and require a second pass through all of the metadata, so if there are a large number of duplicate blocks it can take a long time.
> 
>> 1) fsck.ext4 is using 100% of a 2.2GHz core.  The progress of the fsck 
>> seems to be CPU bound for a long time (many hours).  We're not used to 
>> seeing this.
> 
> If there are a limited number of files, you can restart e2fsck with the option "-E shared=delete", which will cause the inodes with shared blocks to be deleted.  It will of course cause that data to be lost, but it will allow e2fsck to complete much more quickly.
>

Well, we restarted fsck with the "-E shared=delete" option.  It has been 
running for about 16 hours at 100% CPU, almost all of it in Pass 1D 
(where it still is), and has deleted 58 files.

For all I know, these are the only 58 files fsck has even considered, so 
we are thinking about giving up on this and reformatting the OST.

Is there any way to estimate wallclock time required by Pass 1D?  Our 
~8TB OST had approximately 30k 2GB files on it.  Is there any way to 
estimate wallclock time required (ballpark)?

Thanks,
Craig Prescott
UF HPC Center



More information about the lustre-discuss mailing list