[lustre-devel] [EXTERNAL] Re: Direct Modification of Lustre Metadata on Disk

Wed Jan 31 17:26:14 PST 2024

Hello Andreas,

I sincerely apologize for the delay in responding to your previous
email. It was not my intention for my reply to be delayed. Please know
that I have now had the chance to thoroughly review your previous
email, and I am eager to continue our conversation.

Our primary research objective is fault injection in the xattr of the
Lustre file system. It would be immensely helpful if you could direct
me towards publications regarding this topic.

Regarding the ideas related to Lustre, it would indeed be a great
opportunity for us to delve deeper into Lustre-related research and
potentially implement solutions that contribute meaningfully to the
field. I'm keen to hear more about the research ideas you're
considering and how they might align with our interests and goals.
Let's discuss further and explore potential opportunities.

Regarding my current implementation of injecting faults, I am not
creating a large xattr, instead I am trying to overwrite xattrs like
LMA, Linkea, Lovea.
It is possible that I might be injecting fault at a wrong place but I
am reading the MDS disk and traversing the target file inode (the file
I want to inject fault in) and I am able to read the LMA, Linkea,
filename and Lovea and match them with results from "getfattr". Then I
proceed to write a faulty value at these locations on the disk. But I
face problem syncing the changes to the disk. Please help me get more
clarity if I am targeting the wrong place. Additionally I am clearing
the client-side cache to be able to read these new values.
I am doing : "sudo lctl" and then "set_param
ldlm.namespaces.*.lru_size=clear" . However the "getfattr" on the
client still reads the original correct values.

Your guidance and clarity on these matters would be greatly appreciated.

Thanks and Regards,
Saisha

On Sun, Jan 28, 2024 at 12:13 AM Andreas Dilger <adilger at whamcloud.com> wrote:
>
> [Caution: Email from External Sender. Do not click or open links or attachments unless you know this sender.]
>   On Jan 26, 2024, at 12:57, Saisha Kamat via lustre-devel <lustre-devel at lists.lustre.org> wrote:
>
>
> I am a Ph.D. student at UNC-Charlotte, focusing on research related to
> the Lustre File System. As part of my project, I am investigating
> scenarios involving the direct modification of xattr metadata on the
> Lustre disk, without unmounting the Lustre servers.
>
>
> It would be helpful to know what the high-level goal of your research is?
> Is this some type of fault injection mechanism, or are you trying to store
> useful data directly into the xattr, or something else?  Note that there
> have already been a few papers published about this.  If you are looking
> for research ideas related to Lustre I could definitely give you a few, please
> contact me if interested.  Doubly so if you actually implement something
> that is useful at the end of your Ph.D. and not a throw-away project.
>
> To achieve this, I have attempted to open the MDS (Metadata Server)
> disk partition as a file descriptor, locate the target file and its
> xattr, and write a faulty value. However, I have encountered an
> unexpected issue where my changes appear to be saved to memory and are
> not being synchronized with the disk.
>
>
> In general, this is also a good way to corrupt the filesystem.  If the xattr
> is stored directly in the inode (as most of them are) then you will also be
> overwriting the live inode that is also in memory.  In many cases, whatever
> was written directly to disk will be overwritten and lost when the inode is
> flushed from memory.
>
> Alternately, if the inode is already in memory, the xattr will be read from
> RAM (either from the client cache, or from the MDS cache.
>
> If you create a large xattr it will be written to a separate block, which
> would at least avoid massive filesystem corruption.
>
> After completing the write operation, when I read the same xattr
> again, it reflects the corrupted value. Strangely, when using the
> "getfattr" command, the original, correct value is displayed. This
> discrepancy has raised doubts about whether Lustre permits direct
> modifications to its metadata on the disk.
>
>
> The xattr contents are also cached on the client, and direct writes
> to the storage would not invalidate that cache because they bypass
> all of the proper access controls and locking.
>
> Furthermore, I observed that even after unmounting and remounting the
> Lustre file system, the xattr continues to display the corrupted value
> upon reading, whereas "getfattr" still returns the original, correct
> value.
>
>
> That really depends on how you modified the "xattr" and where "getfattr"
> is actually getting the data from.  I suspect you aren't doing what you
> think you are doing.
>
> Please help me understand whether Lustre allows direct modifications
> to its metadata on the disk and if there are any inherent limitations
> or considerations that I should be aware of.
>
>
> No, of course Lustre and ext4 do not "allow" this.  Just like any filesystem
> doesn't "allow" you to run "dd if=/dev/zero of=/dev/sda1" and erase the
> data from the partition.
>
> Additionally, any recommendations or alternative approaches for
> simulating faulty conditions for testing purposes would be highly
> valuable to my research.
>
>
> That really depends on what your research is trying to achieve.  Lustre
> depends on reliable (RAID) storage underneath the MDT and OST.  It
> is possible to use ldiskfs (ext4) or ZFS as underlying storage, and they
> have different reliability vs. performance properties.  If you are testing
> to directly corrupt on-disk storage then you are really testing those disk
> filesystems, and Lustre does not add additional data redundancy layers
> on top of them for metadata today, though there are *some* types of
> internal metadata redundancy that can help recover from storage errors
> (e.g. LFSCK can rebuild the Lustre file layout after errors on the MDT,
> along with some types of directory breakage from the "link" xattr).
>
> ZFS should be able to withstand such data/metadata block corruption
> up to a certain level without any errors, until it just refuses to work at all.
> ldiskfs would *not* be able to handle outright corruption of the on-disk
> data (which is why you use RAID underneath it), but most corruption
> would be localized and the filesystem would generally continue to work
> (modulo the broken bits) even in the face of massive corruption.  Kind
> of like the difference between digital and analog audio signals.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Whamcloud
>
>
>
>
>
>
>