[lustre-discuss] dealing with maybe dead OST
Robin Humble
rjh+lustre at cita.utoronto.ca
Wed Jun 20 07:20:09 PDT 2018
Hi Malcolm,
thanks for replying.
On Tue, Jun 19, 2018 at 08:54:53PM +0000, Cowe, Malcolm J wrote:
>Would using hard links work, instead of mv?
hmm, interesting idea, but no:
# ln some_file /lustre/shadow/some_file
ln: failed to access 'some_file' Cannot send after transport endpoint shutdown
ln is trying to lstat() which fails. I think almost all client
operations are going to fail with a deactivated/down OST.
things like 'lfs getstripe' (pure MDS ops) work ok.
or did you mean doing hard links on the MDT?
unless there's a purely MDS lustre tool to do a mv/rename operation on
the MDT, then I think the only option is to mess around with the low
level suff on the MDT when it's mounted as ldiskfs and hope I don't
break too much...
there used to be a 'lfs mv' (now 'lfs migrate') but that isn't quite the
mv operations I'm after.
any advice or war stories (especially "this is a waste of your time -
it will never work because of X,Y,Z") would be much appreciated :)
time to read more of the lustre manual now...
cheers,
robin
>Malcolm.
>
>
>???On 20/6/18, 1:34 am, "lustre-discuss on behalf of Robin Humble" <lustre-discuss-bounces at lists.lustre.org on behalf of rjh+lustre at cita.utoronto.ca> wrote:
>
> Hi,
>
> so we've maybe lost 1 OST out of a filesystem with 115 OSTs. we may
> still be able to get the OST back, but it's been a month now so
> there's pressure to get the cluster back and working and leave the
> files missing for now...
>
> the complication is that because the OST might come back to life we
> would like to avoid the users rm'ing their broken files and potentially
> deleting them forever.
>
> lustre is 2.5.41 ldiskfs centos6.x x86_64.
>
> ideally I think we'd move all the ~2M files on the OST to a root access
> only "shadow" directory tree in lustre that's populated purely with
> files from the dead OST.
> if we manage to revive the OST then these can magically come back to
> life and we can mv them back into their original locations.
>
> but currently
> mv: cannot stat 'some_file': Cannot send after transport endpoint shutdown
> the OST is deactivated on the client. the client hangs if the OST isn't
> deactivated. the OST is still UP & activated on the MDS.
>
> is there a way to mv files when their OST is unreachable?
>
> seems like mv is an MDT operation so it should be possible somehow?
>
>
> the only thing I've thought of seems pretty out there...
> mount the MDT as ldiskfs and mv the affected files into the shadow
> tree at the ldiskfs level.
> ie. with lustre running and mounted, create an empty shadow tree of
> all dirs under eg. /lustre/shadow/, and then at the ldiskfs level on
> the MDT:
> for f in <list_of_2m_files>; do
> mv /mnt/mdt0/ROOT/$f /mnt/mdt0/ROOT/shadow/$f
> done
>
> would that work?
> maybe we'd also have to rebuild OI's and lfsck - something along the
> lines of the MDT restore procedure in the manual. hopefully that would
> all work with an OST deactivated.
>
>
> alternatively, should we just unlink all the currently dead files from
> lustre now, and then if the OST comes back can we reconstruct the paths
> and filenames from the FID in xattrs's on the revived OST?
> I suspect unlink is final though and this wouldn't work... ?
>
> we can also take an lvm snapshot of the MDT and refer to that later I
> suppose, but I'm not sure how that might help us.
>
> as you can probably tell I haven't had to deal with this particular
> situation before :)
>
> thanks for any help.
>
> cheers,
> robin
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
>_______________________________________________
>lustre-discuss mailing list
>lustre-discuss at lists.lustre.org
>http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
More information about the lustre-discuss
mailing list