[lustre-discuss] Removing stale files {External}

William D. Colburn wcolburn at nrao.edu
Fri Jun 10 09:05:53 PDT 2022


>OPTION THREE: mounted as ldiskfs remove O/*/[1234567890]*[1234567890]
>and then remount the file system.
>
>This would be one option.  Note that with DNE the OST object names will be
>in hex, so the above regexp would not catch all objects.
>

We don't DNE, so our filenames were simple.  I was still a little afraid
to brute force it with a full removal, so instead I started by manually
removing the two files that the kernel was generating syslog messages
about, which then lead to the kernel complaining about two more.  Using
the newly discovered ll_decode_filter_fid command I had made a list of
all the FIDs for all of the files visible in ldiskfs.  I knew that there
were roughly 8 pairs of files that had the same FID, so I removed all of
those.  Once I did that lustre was able to start cleaning the filesystem
on its own.  It took about three hours, and when it was done there were
3 destroys_in_progress reported on the MDS, and the kernel was still
starting an OI trigger scrub for one FID.  When I looked in the ldiskfs
I found 20 files remaining.  I removed those, and the kernel messages
went away.  Somewhere along this journey the file system became marked
dirty at the ext4 level.  An fsck was quick and did almost nothing.
I've double checked all of the OSTs and nothing is producing kernel
errors about lustre anywhere.  So I feel like our lustre is hale and
healthy now.

We haven't set the write count back up yet, since it is a Friday, but
we will next week.

Thank you for the advice, it was the push we needed to be more
aggressive in trying to fix this.  I had spent a lot of time doing very
minor things because I was afraid of breaking everything and losing two
petabytes.  In the end we lost probably hundreds of files, but they can
all be reproduced, so there is no great hardship here other than making
a list.


--Schlake
  Sysadmin IV, NRAO
  Work: 575-835-7281 (BACK IN THE OFFICE!)
  Cell: 575-517-5668 (out of work hours)


More information about the lustre-discuss mailing list