[Lustre-discuss] Problems running e2fsck on oss - incredibly slow!

Andreas Dilger adilger at whamcloud.com
Thu Jun 7 10:21:52 PDT 2012


On 2012-06-07, at 9:49, Arne Brutschy <arne.brutschy at ulb.ac.be> wrote:

> Answering to myself: I needed to give not only write permissions to the
> ostdb directory and the mdsdb, but also to the parent directory as
> well. It seems to work now.
> 
> On a sidenote, does anyone know why the e2fsck is so incredibly slow?
> We have only two osts per system, each running on a 1TB RAID1. The
> check is now running since over 6 hours (!)

Note that it is possible to run lfsck while the filesystem is mounted. 

The lfsck code has always been very slow because of the external databases. That, and the difficulty in building and moving these databases, is the reason why we are working on an in-kernel lfsck replacement.

The new lfsck will be available in 2.4. 

> Maybe there another cause for the IO problems I observed before
> starting to fix corruption...
> 
> Cheers,
> Arne
> 
> On Thu, 7 Jun 2012 11:13:37 +0200 Arne Brutschy
> <arne.brutschy at ulb.ac.be> wrote:
>> Hey,
>> 
>> we're running Lustre 1.8.5 on CentOS 5.3 (rocks). It seems that we
>> have a corrupted file system, so I followed the steps given in the
>> manual:
>> http://wiki.lustre.org/manual/LustreManual18_HTML/LustreRecovery.html#50651260_pgfId-1291230
>> 
>> Steps 1-4 run smoothly. I shared the files created on the mds per NFS.
>> 
>>  $ ls -l ls /mnt/     
>>  total 14164076
>>  -rw-r--r-- 1 root root 46471111680 Jun  6 17:51 mdsdb
>>  -rw-r--r-- 1 root root      106496 Jun  6 16:49 mdsdb.mdshdr
>>  drwxrwxrwx 6 root root        4096 Jun  6 18:01 osts
>> 
>> Step 5 fails on all osts:
>> 
>>  $ e2fsck -v --mdsdb /mnt/mdsdb
>> --ostdb /mnt/osts/ost2/ost2db /dev/sda3 e2fsck 1.41.10.sun2
>> (24-Feb-2010) lustre-OST0002 lustre database creation, check forced.
>> Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory
>> structure Pass 3: Checking directory connectivity
>>  Pass 4: Checking reference counts
>>  Pass 5: Checking group summary information
>>  Pass 6: Acquiring information for lfsck
>>  /mnt/mdsdb:mdshdr
>>  : Permission denied
>>  failure to open database mdshdr: Input/output error
>>  e2fsck: aborted
>> 
>> I changed permissions to 666, and tried to create a link mdsdb:mdshdr
>> that points to mdsdb.mdshdr (I figured this might be a bug as the file
>> indicated above has a ':' instead of a '.'). Neither worked.
>> 
>> Any ideas?
>> 
>> Cheers,
>> Arne
>> 
>> 
>> 
>> 
> 
> 
> 
> -- 
> Arne Brutschy
> Ph.D. Student                    Email    arne.brutschy(AT)ulb.ac.be
> IRIDIA CP 194/6                  Web      iridia.ulb.ac.be/~abrutschy
> Universite' Libre de Bruxelles   Tel      +32 2 650 2273
> Avenue Franklin Roosevelt 50     Fax      +32 2 650 2715
> 1050 Bruxelles, Belgium          (Tel and Fax both IRIDIA secretary)
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss



More information about the lustre-discuss mailing list