[Lustre-discuss] Files written to an OST are corrupted
Bob Ball
ball at umich.edu
Thu Sep 19 07:22:53 PDT 2013
Hi, everyone,
I need some help in figuring out what may have happened here, as newly
created files on an OST are being corrupted. I don't know if this
applies to all files written to this OST, or just to files of order 2GB
size, but files are definitely being corrupted, with no errors reported
by the OSS machine.
Let me describe the situation. We had been running Lustre 1.8.4 for
several years. With our upgrade from SL5 to SL6.4 we also switched to
Lustre 2.1.6. The OST were left "as is", with no reformatting. A few
weeks ago, we began to throw IO errors on one of the OST on one OSS.
This was almost certainly related to an ill-performed replacement of a
failed disk in the RAID-5 volume. e2fsck did not help, so the OST was
set RO, and drained using lfs_migrate.
When the drain was complete, the following sequence of commands was used
to reformat and remount the volume. This procedure was successfully
used under Lustre 1.8.4. This is a 9-disk, RAID-5, 5.5TB volume, on a
Dell MD1000 shelf using a PERC-6 controller. A second 5-disk RAID-5
shares the shelf, with the 15th disk as a hot spare, and that second
volume is not having issues.
289 mkdir reformat
290 cd reformat
292 mkdir -p /mnt/ost
293 mount -t ldiskfs /dev/sdc /mnt/ost
294 mkdir sdc
295 pushd /mnt/ost
296 cp -p last_rcvd /root/reformat/sdc
297 cd O
298 cd 0
299 cp -p LAST_ID /root/reformat/sdc
300 cd ../..
301 cp -p CONFIGS/* /root/reformat/sdc
304 umount /mnt/ost
At this point, the web interface of Dell's OMSA was used to do a complete,
slow initialization of the volume. No further action was taken until that process
completed.
The index, inode count, and stripe are taken from the files above (not shown in
this email) when the volumes were first created.
309 mkfs.lustre --ost --mgsnode=10.10.1.140 at tcp0 --fsname=umt3 --reformat --index=35 \
--mkfsoptions="-i 2000000" --reformat --mountfsoptions="errors=remount-ro,extents,mballoc,stripe=256" /dev/sdc
The UUID here is taken from the /etc/fstab, where the entry has been commented
out until we are ready to again use the volume
310 tune2fs -O uninit_bg -m 1 -U 02bcb3d2-ad48-4992-ba71-7b48787defea /dev/sdc
311 e2fsck -fy /dev/sdc
312 mount -t ldiskfs /dev/sdc /mnt/ost
Copy back all identifiers so that the volume can continue from where it was left off
315 cd /root/reformat/sdc
316 cp -v /mnt/ost/CONFIGS/mountdata mountdata.new2
317 cp -fv mountdata /mnt/ost/CONFIGS
319 cp last_rcvd /mnt/ost
320 mkdir -p /mnt/ost/O/0
321 chmod 700 /mnt/ost/O
322 chmod 700 /mnt/ost/O/0
323 cp -fv LAST_ID /mnt/ost/O/0
324 umount /mnt/ost
Add the fstab entry back in again, and remount the disk
325 vi /etc/fstab
326 mount -av
I was quite pleasantly surprised by the speed of the reformatted volume when
I used lfs_migrate to repopulate it from the file list previously removed. The
volume seemed fine. Then a user reported that his newly created files (a gridftp
variant was used), that were written to this volume, were corrupted, whereas if
they copied to a different volume, then they were just fine. "md5sum" shows they
are, indeed, different despite showing the same size via "ls".
Can anyone tell me what may have gone wrong here? Is there something I need to
have done, but did not? Where should I begin to look? Neither the client, nor
the OSS, logged any kind of error for the volume during this time. I am truly
at a loss here. All help is appreciated.
Thanks much,
bob
More information about the lustre-discuss
mailing list