[lustre-discuss] dealing with maybe dead OST

Robin Humble rjh+lustre at cita.utoronto.ca
Tue Jun 19 08:33:50 PDT 2018


Hi,

so we've maybe lost 1 OST out of a filesystem with 115 OSTs. we may
still be able to get the OST back, but it's been a month now so
there's pressure to get the cluster back and working and leave the
files missing for now...

the complication is that because the OST might come back to life we
would like to avoid the users rm'ing their broken files and potentially
deleting them forever.

lustre is 2.5.41 ldiskfs centos6.x x86_64.

ideally I think we'd move all the ~2M files on the OST to a root access
only "shadow" directory tree in lustre that's populated purely with
files from the dead OST.
if we manage to revive the OST then these can magically come back to
life and we can mv them back into their original locations.

but currently
  mv: cannot stat 'some_file': Cannot send after transport endpoint shutdown
the OST is deactivated on the client. the client hangs if the OST isn't
deactivated. the OST is still UP & activated on the MDS.

is there a way to mv files when their OST is unreachable?

seems like mv is an MDT operation so it should be possible somehow?


the only thing I've thought of seems pretty out there...
mount the MDT as ldiskfs and mv the affected files into the shadow
tree at the ldiskfs level.
ie. with lustre running and mounted, create an empty shadow tree of
all dirs under eg. /lustre/shadow/, and then at the ldiskfs level on
the MDT:
  for f in <list_of_2m_files>; do
     mv /mnt/mdt0/ROOT/$f /mnt/mdt0/ROOT/shadow/$f
  done

would that work?
maybe we'd also have to rebuild OI's and lfsck - something along the
lines of the MDT restore procedure in the manual. hopefully that would
all work with an OST deactivated.


alternatively, should we just unlink all the currently dead files from
lustre now, and then if the OST comes back can we reconstruct the paths
and filenames from the FID in xattrs's on the revived OST?
I suspect unlink is final though and this wouldn't work... ?

we can also take an lvm snapshot of the MDT and refer to that later I
suppose, but I'm not sure how that might help us.

as you can probably tell I haven't had to deal with this particular
situation before :)

thanks for any help.

cheers,
robin


More information about the lustre-discuss mailing list