[lustre-discuss] dealing with maybe dead OST

Robin Humble rjh+lustre at cita.utoronto.ca
Wed Jun 20 07:20:09 PDT 2018

Hi Malcolm,

thanks for replying.

On Tue, Jun 19, 2018 at 08:54:53PM +0000, Cowe, Malcolm J wrote:
>Would using hard links work, instead of mv?

hmm, interesting idea, but no:
  # ln some_file /lustre/shadow/some_file
  ln: failed to access 'some_file' Cannot send after transport endpoint shutdown

ln is trying to lstat() which fails. I think almost all client
operations are going to fail with a deactivated/down OST.

things like 'lfs getstripe' (pure MDS ops) work ok.

or did you mean doing hard links on the MDT?

unless there's a purely MDS lustre tool to do a mv/rename operation on
the MDT, then I think the only option is to mess around with the low
level suff on the MDT when it's mounted as ldiskfs and hope I don't
break too much...

there used to be a 'lfs mv' (now 'lfs migrate') but that isn't quite the
mv operations I'm after.

any advice or war stories (especially "this is a waste of your time -
it will never work because of X,Y,Z") would be much appreciated :)

time to read more of the lustre manual now...


>???On 20/6/18, 1:34 am, "lustre-discuss on behalf of Robin Humble" <lustre-discuss-bounces at lists.lustre.org on behalf of rjh+lustre at cita.utoronto.ca> wrote:
>    Hi,
>    so we've maybe lost 1 OST out of a filesystem with 115 OSTs. we may
>    still be able to get the OST back, but it's been a month now so
>    there's pressure to get the cluster back and working and leave the
>    files missing for now...
>    the complication is that because the OST might come back to life we
>    would like to avoid the users rm'ing their broken files and potentially
>    deleting them forever.
>    lustre is 2.5.41 ldiskfs centos6.x x86_64.
>    ideally I think we'd move all the ~2M files on the OST to a root access
>    only "shadow" directory tree in lustre that's populated purely with
>    files from the dead OST.
>    if we manage to revive the OST then these can magically come back to
>    life and we can mv them back into their original locations.
>    but currently
>      mv: cannot stat 'some_file': Cannot send after transport endpoint shutdown
>    the OST is deactivated on the client. the client hangs if the OST isn't
>    deactivated. the OST is still UP & activated on the MDS.
>    is there a way to mv files when their OST is unreachable?
>    seems like mv is an MDT operation so it should be possible somehow?
>    the only thing I've thought of seems pretty out there...
>    mount the MDT as ldiskfs and mv the affected files into the shadow
>    tree at the ldiskfs level.
>    ie. with lustre running and mounted, create an empty shadow tree of
>    all dirs under eg. /lustre/shadow/, and then at the ldiskfs level on
>    the MDT:
>      for f in <list_of_2m_files>; do
>         mv /mnt/mdt0/ROOT/$f /mnt/mdt0/ROOT/shadow/$f
>      done
>    would that work?
>    maybe we'd also have to rebuild OI's and lfsck - something along the
>    lines of the MDT restore procedure in the manual. hopefully that would
>    all work with an OST deactivated.
>    alternatively, should we just unlink all the currently dead files from
>    lustre now, and then if the OST comes back can we reconstruct the paths
>    and filenames from the FID in xattrs's on the revived OST?
>    I suspect unlink is final though and this wouldn't work... ?
>    we can also take an lvm snapshot of the MDT and refer to that later I
>    suppose, but I'm not sure how that might help us.
>    as you can probably tell I haven't had to deal with this particular
>    situation before :)
>    thanks for any help.
>    cheers,
>    robin
>    _______________________________________________
>    lustre-discuss mailing list
>    lustre-discuss at lists.lustre.org
>    http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>lustre-discuss mailing list
>lustre-discuss at lists.lustre.org

More information about the lustre-discuss mailing list