[lustre-devel] [External] : Re: Inquiry Regarding Checksum Support for Lustre Extended Attributes
Andreas Dilger
adilger at whamcloud.com
Wed Jun 12 16:57:06 PDT 2024
On Jun 12, 2024, at 16:54, Patrick Farrell <patrick.farrell at oracle.com<mailto:patrick.farrell at oracle.com>> wrote:
A few clarifying questions, partly for me and hopefully for Saisha as well as a non-Lustre expert. The ldiskfs file system supported for Lustre disk targets is a modified ext4, so it inherits ext4 features for the local volume, but not all of them are supported for ldiskfs formatted volumes. So does Lustre support use of metadata_csum (that's the relevant option, isn't it?) when formatting ldiskfs? Is it on by default?
I _thought_ that xattrs have had built-in checksums since they were first added to ext3, long before metadata_csum existed. However, looking at the metadata_csum patch handling xattrs, I see that it is adding the 'h_checksum' field itself. I guess I was conflating the xattr *checksum* with the xattr *hash* which is used for xattr block sharing, but is never actually verified by the kernel.
Inferring from your comments, I think you're saying metadata_csums are (or can be) used on an ldiskfs volume. But since they are just a local disk feature, they contribute only minimally to resiliency in Lustre, and Lustre makes no use of them. They can help protect against some damage to the local volume, but Lustre is unaware of them.
Unfortunately, the Lustre "dirdata" feature did not previously interoperate well with metadata_csum (which was developed many years later), so currently mkfs.lustre explicitly disables the metadata_csum feature when the filesystem is formatted. There have been incremental fixes related to dirdata and metadata_csum over the years, so it is possible that these would work together today, but it has not been on anyone's radar to test and verify that these features are OK to use together. https://jira.whamcloud.com/browse/LU-13650 is open to track this, maybe someone would be interested to work on that...
Do the metadata checksums extend to all ext4/ldiskfs extended attributes? Are they done separately for individual xattrs or just for the whole inode?
There is an h_checksum field for the xattr header that covers all of the xattrs, but only the e_hash field for the individual entry:
struct ext4_xattr_header {
__le32 h_magic; /* magic number for identification */
__le32 h_refcount; /* reference count */
__le32 h_blocks; /* number of disk blocks used */
__le32 h_hash; /* hash value of all attributes */
__le32 h_checksum; /* crc32c(uuid+id+xattrblock) */
/* id = inum if refcount=1, blknum otherwise */
__u32 h_reserved[3]; /* zero right now */
};
struct ext4_xattr_entry {
__u8 e_name_len; /* length of name */
__u8 e_name_index; /* attribute name index */
__le16 e_value_offs; /* offset in disk block of value */
__le32 e_value_block; /* disk block attribute is stored on (n/i) */
__le32 e_value_size; /* size of attribute value */
__le32 e_hash; /* hash value of name and value */
char e_name[0]; /* attribute name */
};
The kernel wiki describes a possible implementation that credits someone named 'Andreas Dilger' ( :) ) with suggesting checksumming them under the inode checksum rather than separately, but I don't know what was actually implemented.
I think that was for xattrs stored within the inode itself, since in-inode xattrs do not have the "ext4_xattr_header" that is used in external xattr blocks, but apparently you can't trust everything that guy writes... :-) Thanks for keeping me honest.
Cheers, Andreas
Regards,
Patrick
________________________________
From: lustre-devel <lustre-devel-bounces at lists.lustre.org<mailto:lustre-devel-bounces at lists.lustre.org>> on behalf of Andreas Dilger via lustre-devel <lustre-devel at lists.lustre.org<mailto:lustre-devel at lists.lustre.org>>
Sent: Wednesday, June 12, 2024 5:27 PM
To: Saisha Kamat <skamat1 at charlotte.edu<mailto:skamat1 at charlotte.edu>>
Cc: lustre-devel <lustre-devel at lists.lustre.org<mailto:lustre-devel at lists.lustre.org>>
Subject: [External] : Re: [lustre-devel] Inquiry Regarding Checksum Support for Lustre Extended Attributes
On Jun 12, 2024, at 12:35, Saisha Kamat via lustre-devel <lustre-devel at lists.lustre.org<mailto:lustre-devel at lists.lustre.org>> wrote:
Hello,
I hope this email finds you well.
My name is Saisha, and I am currently pursuing my Ph.D. at UNC-Charlotte, with a focus on research related to the Lustre File System. As part of my project, I am exploring the possibility of utilizing checksums to verify Lustre extended attributes.
My understanding is that the ext4 file system supports checksums for extended attributes (xattrs). However, I am interested in whether this functionality extends to Lustre as well.
Yes, the ext4 (and ZFS) xattr values have checksums. No, the xattr checksums are neither managed or verified by Lustre, and only come into effect when they are passed on to the backing filesystem. Conceivably, it would be possible to have a checksum (e.g. crc32c) for the xattr values in the MDS_GETXATTR and MDS_SETXATTR RPCs, if this is something you are interested to contribute.
This could probably be done by overloading one of the 32-bit fields in the mdt_body for getxattr, and one in mdt_rec_reint for setxattr, but there is also "opportunistic" xattr prefetching done in the lookup RPC, so that would need to be covered as well.
Also, the checksum would also need to be kept with the xattrs in cache and verified on access, otherwise they could become corrupted in memory after the RPC processing had completed.
Finally, there is no interface to specify or verify the xattr checksum in the syscall interface, so there can be no guarantee that the data supplied in the setxattr is correct, or remains correct after supplied to getxattr, but the window there is very small.
Cheers, Andreas
I would greatly appreciate it if you could provide some insights or direct me to relevant documentation on this matter. Any information or guidance you can offer would be invaluable to my research.
Thank you very much for your time and assistance.
Thanks and Regards,
Saisha
_______________________________________________
lustre-devel mailing list
lustre-devel at lists.lustre.org<mailto:lustre-devel at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org<https://urldefense.com/v3/__http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org__;!!ACWV5N9M2RV99hQ!NxZsGb9nGU3ZLJa1pn9PclSTh2QUuTOnSdBLFvT4cYZxHl-jq2z3TrARWnu9vlkaS-diipwr7e73a1J0pgTw4Htt05quzd0a$>
Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud
Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20240612/c1d1d9b9/attachment-0001.htm>
More information about the lustre-devel
mailing list