[Lustre-discuss] Failed OST Cleanup

Bernd Schubert bs_lists at aakef.fastmail.fm
Wed Jun 2 14:12:45 PDT 2010


On Wednesday 02 June 2010, Andreas Dilger wrote:
> On 2010-06-02, at 11:54, Scott Barber wrote:
> > I'm now trying to get a list of files that are now corrupt. On one of
> > the lustre clients I'm running:
> > lfs find --obd sanvol06-OST0013_UUID  <my lustre mount point>
> >
> > It starts to list files and then a few minutes later it runs into an
> > error and stops:
> > cb_find_init: IOC_LOV_GETINFO on <filename> failed: Input/output error.
> >
> > In dmesg I see:
> > LustreError: 13926:0:(file.c:1053:ll_glimpse_size()) obd_enqueue
> > returned rc -5, returning -EIO
> >
> > The file that gets that "Input/output error" cannot be delete or
> > removed from the file system. How can I get around this?
> 
> There is a bug in "lfs find" that it tries to get the file size
>  unnecessarily.  You can use "lfs getstripe -obd ..." instead, and it
>  should work even if the OST is down.

Hmm, yes and no. In principle I like the idea that lfs find tries to figure 
out the file size. A couple of years ago I had to deal with 3 disk failure of 
raid6 and although we tried to clone the 3rd failing disk, in the end we lost 
that OST. Now there was stripe size of 4M and a stripe count of 4 configured.
When I then run 'lfs find' to find files located on that OST, it reported lots 
of file, that *would* have data on that OST, if the file would have 
sufficiently large. But then lots of files had been smaller than 1M and so it 
would have been wrong to delete those files. It turned out that 'lfs find' was 
rather useless for us and I simply had to read each file - if read succeeded 
all was fine, it it failed I moved it into a dedicated subdirectory. The 
missing OST later on was recreated (that was more easy that time with 1.4 than 
nowadays) and we only lost a small part of the file, definitely much less than 
what 'lfs find' suggested.

So if 'lfs find' now used the filesize to determine if a file is really 
located on an OST, that would be an improvement. Of course, if it fails at all 
with an IO error, it is also not useful ;)

Cheers,
Bernd


-- 
Bernd Schubert
DataDirect Networks



More information about the lustre-discuss mailing list