[Lustre-discuss] Files written to an OST are corrupted

Thu Sep 19 07:22:53 PDT 2013

Hi, everyone,

I need some help in figuring out what may have happened here, as newly 
created files on an OST are being corrupted.  I don't know if this 
applies to all files written to this OST, or just to files of order 2GB 
size, but files are definitely being corrupted, with no errors reported 
by the OSS machine.

Let me describe the situation.  We had been running Lustre 1.8.4 for 
several years.  With our upgrade from SL5 to SL6.4 we also switched to 
Lustre 2.1.6.  The OST were left "as is", with no reformatting. A few 
weeks ago, we began to throw IO errors on one of the OST on one OSS.  
This was almost certainly related to an ill-performed replacement of a 
failed disk in the RAID-5 volume.  e2fsck did not help, so the OST was 
set RO, and drained using lfs_migrate.

When the drain was complete, the following sequence of commands was used 
to reformat and remount the volume.  This procedure was successfully 
used under Lustre 1.8.4.  This is a 9-disk, RAID-5, 5.5TB volume, on a 
Dell MD1000 shelf using a PERC-6 controller.  A second 5-disk RAID-5 
shares the shelf, with the 15th disk as a hot spare, and that second 
volume is not having issues.

   289  mkdir reformat
   290  cd reformat
   292  mkdir -p /mnt/ost
   293  mount -t ldiskfs /dev/sdc /mnt/ost
   294  mkdir sdc
   295  pushd /mnt/ost
   296  cp -p last_rcvd /root/reformat/sdc
   297  cd O
   298  cd 0
   299  cp -p LAST_ID /root/reformat/sdc
   300  cd ../..
   301  cp -p CONFIGS/* /root/reformat/sdc
   304  umount /mnt/ost

At this point, the web interface of Dell's OMSA was used to do a complete,
slow initialization of the volume.  No further action was taken until that process
completed.

The index, inode count, and stripe are taken from the files above (not shown in
this email) when the volumes were first created.

   309  mkfs.lustre --ost --mgsnode=10.10.1.140 at tcp0 --fsname=umt3 --reformat --index=35 \
--mkfsoptions="-i 2000000" --reformat --mountfsoptions="errors=remount-ro,extents,mballoc,stripe=256" /dev/sdc

The UUID here is taken from the /etc/fstab, where the entry has been commented
out until we are ready to again use the volume

   310  tune2fs -O uninit_bg -m 1 -U 02bcb3d2-ad48-4992-ba71-7b48787defea /dev/sdc
   311  e2fsck -fy /dev/sdc
   312  mount -t ldiskfs /dev/sdc /mnt/ost

Copy back all identifiers so that the volume can continue from where it was left off

   315  cd /root/reformat/sdc
   316  cp -v /mnt/ost/CONFIGS/mountdata mountdata.new2
   317  cp -fv mountdata /mnt/ost/CONFIGS
   319  cp last_rcvd /mnt/ost
   320  mkdir -p /mnt/ost/O/0
   321  chmod 700 /mnt/ost/O
   322  chmod 700 /mnt/ost/O/0
   323  cp -fv LAST_ID /mnt/ost/O/0
   324  umount /mnt/ost

Add the fstab entry back in again, and remount the disk

   325  vi /etc/fstab
   326  mount -av

I was quite pleasantly surprised by the speed of the reformatted volume when
I used lfs_migrate to repopulate it from the file list previously removed.  The
volume seemed fine.  Then a user reported that his newly created files (a gridftp
variant was used), that were written to this volume, were corrupted, whereas if
they copied to a different volume, then they were just fine.  "md5sum" shows they
are, indeed, different despite showing the same size via "ls".

Can anyone tell me what may have gone wrong here?  Is there something I need to
have done, but did not?  Where should I begin to look?  Neither the client, nor
the OSS, logged any kind of error for the volume during this time.  I am truly
at a loss here.  All help is appreciated.

Thanks much,
bob