[lustre-discuss] dealing with maybe dead OST

Cowe, Malcolm J malcolm.j.cowe at intel.com
Tue Jun 19 13:54:53 PDT 2018


Would using hard links work, instead of mv?

Malcolm.
 

On 20/6/18, 1:34 am, "lustre-discuss on behalf of Robin Humble" <lustre-discuss-bounces at lists.lustre.org on behalf of rjh+lustre at cita.utoronto.ca> wrote:

    Hi,
    
    so we've maybe lost 1 OST out of a filesystem with 115 OSTs. we may
    still be able to get the OST back, but it's been a month now so
    there's pressure to get the cluster back and working and leave the
    files missing for now...
    
    the complication is that because the OST might come back to life we
    would like to avoid the users rm'ing their broken files and potentially
    deleting them forever.
    
    lustre is 2.5.41 ldiskfs centos6.x x86_64.
    
    ideally I think we'd move all the ~2M files on the OST to a root access
    only "shadow" directory tree in lustre that's populated purely with
    files from the dead OST.
    if we manage to revive the OST then these can magically come back to
    life and we can mv them back into their original locations.
    
    but currently
      mv: cannot stat 'some_file': Cannot send after transport endpoint shutdown
    the OST is deactivated on the client. the client hangs if the OST isn't
    deactivated. the OST is still UP & activated on the MDS.
    
    is there a way to mv files when their OST is unreachable?
    
    seems like mv is an MDT operation so it should be possible somehow?
    
    
    the only thing I've thought of seems pretty out there...
    mount the MDT as ldiskfs and mv the affected files into the shadow
    tree at the ldiskfs level.
    ie. with lustre running and mounted, create an empty shadow tree of
    all dirs under eg. /lustre/shadow/, and then at the ldiskfs level on
    the MDT:
      for f in <list_of_2m_files>; do
         mv /mnt/mdt0/ROOT/$f /mnt/mdt0/ROOT/shadow/$f
      done
    
    would that work?
    maybe we'd also have to rebuild OI's and lfsck - something along the
    lines of the MDT restore procedure in the manual. hopefully that would
    all work with an OST deactivated.
    
    
    alternatively, should we just unlink all the currently dead files from
    lustre now, and then if the OST comes back can we reconstruct the paths
    and filenames from the FID in xattrs's on the revived OST?
    I suspect unlink is final though and this wouldn't work... ?
    
    we can also take an lvm snapshot of the MDT and refer to that later I
    suppose, but I'm not sure how that might help us.
    
    as you can probably tell I haven't had to deal with this particular
    situation before :)
    
    thanks for any help.
    
    cheers,
    robin
    _______________________________________________
    lustre-discuss mailing list
    lustre-discuss at lists.lustre.org
    http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
    



More information about the lustre-discuss mailing list