[lustre-discuss] dealing with maybe dead OST

Robin Humble rjh+lustre at cita.utoronto.ca
Tue Jun 26 07:57:06 PDT 2018


Hi Andreas,

On Wed, Jun 20, 2018 at 05:39:33PM +0000, Andreas Dilger wrote:
>On Jun 19, 2018, at 09:33, Robin Humble <rjh+lustre at cita.utoronto.ca> wrote:
>> is there a way to mv files when their OST is unreachable?
>> ...
>> the only thing I've thought of seems pretty out there...
>> mount the MDT as ldiskfs and mv the affected files into the shadow
>> tree at the ldiskfs level.
>> ie. with lustre running and mounted, create an empty shadow tree of
>> all dirs under eg. /lustre/shadow/, and then at the ldiskfs level on
>> the MDT:
>>  for f in <list_of_2m_files>; do
>>     mv /mnt/mdt0/ROOT/$f /mnt/mdt0/ROOT/shadow/$f
>>  done
>> 
>> would that work?
>
>This would work to some degree, but the "link" xattr on each file
>would not be updated, so "lfs fid2path" would be broken until a
>full LFSCK is run.

although as you say, it turns out the rename() approach at the client
level will work fine, it's still good to know that Lustre is flexible
and robust enough for some crazy stuff to work if it had to :)

>> alternatively, should we just unlink all the currently dead files from
>> lustre now, and then if the OST comes back can we reconstruct the paths
>> and filenames from the FID in xattrs's on the revived OST?
>> I suspect unlink is final though and this wouldn't work... ?
>
>That would be possible, but overly complex, since the inodes would be
>removed from the MDT and you'd need to reconstruct them with LFSCK and
>find the names, as LFSCK would dump them all into $MNT/.lustre/lost+found.
>
>> we can also take an lvm snapshot of the MDT and refer to that later I
>> suppose, but I'm not sure how that might help us.
>
>It should be possible to copy the unlinked files from the backup MDT
>to the current MDT (via ldiskfs), along with an LFSCK run to rebuild
>the OI files.  It is always a good idea to have an MDT device-level
>backup before you do anything drastic like this.  However, for the
>meantime I think that renaming the broken files to a root-only directory
>is the safest.

thanks (as always) for all the detailed explanations.
much appreciated.

cheers,
robin


More information about the lustre-discuss mailing list