[lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?

Chris Hunter chris.hunter at yale.edu
Fri Sep 11 05:45:35 PDT 2015

On 09/11/2015 03:41 AM, Martin Hecht wrote:
> On 09/11/2015 05:23 AM, Dilger, Andreas wrote:
>> On 2015/09/10, 6:54 PM, "Chris Hunter" <chris.hunter at yale.edu> wrote:
>>> We experienced file corruption on several OSTs. We proceeded through
>>> recovery using e2fsck & ll_recover_lost_found_obj tools.
>>> Following these steps, e2fsck came out clean.
>>> The file corruption did not impact the MDT. The files were still
>>> referenced by the MDT. Accessing the file on a lustre client (ie. ls -l)
>>> would report error "Cannot allocate memory"
>>> Following OST recovery steps, we started removing the corrupt files via
>>> "unlink" command on lustre client (rm command would not remove file).
>>> Now dry-run e2fsck of the OST is reporting errors:
>>> "deleted/unused inodes" in Pass 2 (checking directory structure),
>>> "Unattached inodes" in Pass 4 (checking reference counts)
>>> "free block count wrong" in Pass 5 (checking group summary information).
>>> Is e2fsck errors expected when unlinking files ?
>> No, the "unlink" command is just avoiding the -ENOENT error that "rm" gets
>> by calling "stat()" on the file before trying to unlink it.  This
>> shouldn't cause any errors on the OSTs, unless there is ongoing corruption
>> from the back-end storage.
> Chris, with "live filesystem" you mean that you ran a readonly e2fsck on
> a lustre file system while it was mounted and clients working on the
> file system? Then, it is expected that e2fsck reports some error,
> because the file system contents changes while the e2fsck is running and
> the in-memory directory structure does not fit to the on-disk data
> anymore. However, as Andreas points out, it might as well be a sign of
> ongoing corruption on the storage, but only an offline e2fsck (i.e.
> while the OST is unmounted, and the journal is played back) can clarify
> this.
Hi Martin, good point. The filesystem is active (3 clients) so e2fsck 
errors could be due to uncommitted journal transactions.
It would be nice to rule out underlying hardware issues before we do a 
full e2fsck.
chris hunter

More information about the lustre-discuss mailing list