[Lustre-discuss] Data corruption?

Nirmal Seenu nirmal at fnal.gov
Fri Mar 27 08:21:09 PDT 2009


 >> I recently moved the MDT to a different partition.
 >
 > Hrm.  How did you do that?  Did you lose some information in the move
perhaps?

It was different partitions on the same machine. I did the following 
from a LVM snapshot of the old MDT which got mounted as ldiskfs

(tar --sparse -cf - . | ( cd /mnt/new-mdt; tar -xf -)) &

and then brought the entire Lustre filesystem down and did a final rsync 
after mounting both the old and new MDT as ldiskfs:

rsync -aSv /mnt/mdt/ /mnt/new-mdt

and then I did a getfattr, setfattr and "rm OBJECTS/* CATALOGS"

 >> I performed a lfsck
 >> on the the file system and got these errors from lfsck:
 >>
 >> /aa/bb/cc/dd/xx/cosmology.c object 8024 not created

 > Like just that one or lots of them?  I'd think you should see one for
 > that file you reference above: cart_wengen.odd_even.txt

cart_wengen.odd_even.txt is a new file that was created by a user after 
the lustre filesystem was brought online after performing a lfsck. i.e. 
I don't have an entry for cart_wengen.odd_even.txt in lfsck output, but 
the "ls -l" output lists the same file with ?.

During lfsck, 6612 files had the error "object not created"

On a "ls -lR" output on the lustre client, I have 6774 files that have ? 
in their output.

All our OSTs are healthy RAID6 volumes on SATABeasts and there is no 
known hardware problem. There are no errors reported When I do a e2fsck 
against each individual filesystem.

Not much I/O has happened since the last time I did a lfsck(last night). 
Would another lfsck actually help in fixing this problem?

Thanks
Nirmal



More information about the lustre-discuss mailing list