[lustre-discuss] [Samba] Odd "File exists" behavior when copy-pasting many files to an SMB exported Lustre FS

Andreas Dilger adilger at whamcloud.com
Thu Sep 22 19:48:43 PDT 2022


On Sep 22, 2022, at 03:21, Michael Weiser via lustre-discuss <lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>> wrote:

Hey Daniel! :)
Hi Jeremy,
Hi Bjoern,

I'm cross-posting to the samba list again as I think this might be of interest there as well and to keep the threads together.

That leaves the question, where that extended attribute user.lustre.lov is coming from. It
appears that Lustre itself exposes a number of extended attributes for every file which
reflect internal housekeeping data:

$ getfattr -m - blarf
# file: blarf
lustre.lov
trusted.link<http://trusted.link>
trusted.lma
trusted.lov

Try adding

 lustre.* skip

to /etc/xattr.conf (cf.
https://doc.lustre.org/lustre_manual.xhtml#lustre_configure_multiple_fs).
Haven't tested yet, but Samba's EA handling seems to be libattr-based,
so the above tweak should be sufficient to keep smbd off of these
fs-specific attributes.

Thanks for the tip! I tried and unfortunately it didn't work. From my looking at the code it seems that samba uses libattr only as a fallback if no compatble system implementation of fgetxattr can be found. In my cases (RHEL 8.6/samba-4.15.5 and debug system debian testing/samba-4.17.0) it seems to use the system interface directly.

Running with that: From the Lustre documentation it appears that there have been problems with exposing Lustre internal data via extended attributes in the past, prompting the xattr.conf workaround:

[from the docs]:
If a client(s) will be mounted on several file systems, add the following line to /etc/xattr.conf file to avoid problems when files are moved between the file systems: lustre.* skip"

What exactly were those problems hinted at in the documentation?
Is the visibility of the lustre.lov attribute for unprivileged users actually needed for anything?
Can exposing it to unprivileged users be switched off Lustre-side?

The lustre.lov xattr is used for backup and restore of the file layout, used if files are stored across multiple servers (e.g. if they are huge, or have mixed flash + disk storage, mirrored, etc).  It isn't critical that this is saved/restored, but it is useful.  Tools like tar can save/restore this xattr to preserve this layout across filesystems, or if archived to tape, etc.

Looking at xattr.conf highlights the fact that Lustre isn't as singular as I thought in putting out those lustre.lov attributes. At least AFS and XFS seem to do the same or at least at some point in time have done so. (From the comment "obsolete" on xfsroot.* it appears that may have been changed.)

Jeremy wrote:
Great analysis Michael ! As we're emulating NTFS CreateFile
we can't do the 'create with EA's' atomically.

Lustre really should not be exposing EA's to callers if it doesn't actually support EA's.

An elegant solution might be to add a Samba VFS module
vfs_lustre.c that intercepts fgetxattr/fsetxattr/flistxattr calls and simply
strips out the lustre.lov EA's from being seen.

I had a first impelementation of that attached to my first mail. Did that get through? In my case it was enough to mask lustre.lov in flistxattr and fgetxattr so that clients never get to see it and fsetxattr is never attempted.

What's bugging me about this approach as well as the xattr.conf workaround is that the error behaviour on the client side is so very unintuitive. How will we get people to correlate some "file already exists" error with peculiar extended attributes behaviour of their file system so they become aware they need to configure a workaround? It'd certainly be nice if we could find a way for samba to "just do the right thing"[tm].

Can you log a bug in our bugzilla and upload all this info so we can track it ?

That's underway (requested an account) and I'll upload everything there.

is this really Lustre specific? I assume we see the same effect on Linux with
other filesystems that don't support EAs.

No. Lustre is returning "fictional" EA's that
cannot be set. Linux filesystems that don't have
EA's don't do that.

The attributes using a non-existant namespace (lustre.*) doesn't seem exactly right[tm] to me either. And it would wreak havoc if samba were actually able to set the canonicalised user.lustre.lov attribute when copying it back, duplicating and likely somewhat non-deterministically overwriting it later.

There *is* a Lustre xattr namespace, it just only gets used by Lustre clients...  That said, the root of the problem (AFAICS) is that the xattr name is changed from lustre.lov to user.lustre.lov as it moves from Lustre->NTFS->Lustre.  If the xattr name was kept consistent (e.g. as with tar, rsync, etc.) then there shouldn't be a problem.

But more crucially, what seems problematic here is that Lustre supports listing and reading extended attributes for unprivileged users but does not allow setting them and returns ENOTSUP rather than EPERM or something else at that. So samba would need to take into account that not all filesystems support extended attributes as a whole but might support some operations on them but not others. I'm with Bjoern that there likely are or will be other filesystems with peculiar extended attribute behaviour.

Again, this is because the NTFS->Lustre xattr copy is changing from lustre.lov to user.lustre.lov, and the "-o user_xattr" feature is not enabled on this filesystem.  If "-o user_xattr" was enabled on the server, then the "user.lustre.lov" xattr would be saved without errors, but it wouldn't serve its intended purpose.  That isn't fatal if the lustre.lov isn't restored, but storing it twice is a waste of space.

What might be the possible fallout from removing the created file in the error code path? Shouldn't it be safe with proper locking in place as it appears to be?  Wouldn't a best-effort cleanup in the error path be better than leaving a known-to-be-incorrect state behind?

This seems like a bug in Samba, or maybe in Windows?  Creating a large number of zero-length files seems problematic/racy, since any interruption during this process (for any reason, not just the xattr error) would leave a lot of cruft behind.  It might be possible to mark those files as 'placeholders' based on the file access permissions or similar, so that a retry would overwrite them, but that is still hackish.

Lustre itself does not yet support O_TMPFILE, which would allow instantiating the files fully-formed into the namespace, though that is something we've been looking at adding.  There is an equivalent functionality to create "nameless files" that predates O_TMPFILE that Lustre-specific tools use, but it doesn't allow the "post-link" operation that O_TMPFILE does.

# /etc/xattr.conf
#
# Format:
# <pattern> <action>
#
# Actions:
#   permissions - copy when trying to preserve permissions.
#   skip - do not copy.

system.nfs4_acl                 permissions
system.nfs4acl                  permissions
system.posix_acl_access         permissions
system.posix_acl_default        permissions
trusted.SGI_ACL_DEFAULT         skip            # xfs specific
trusted.SGI_ACL_FILE            skip            # xfs specific
trusted.SGI_CAP_FILE            skip            # xfs specific
trusted.SGI_DMI_*               skip            # xfs specific
trusted.SGI_MAC_FILE            skip            # xfs specific
xfsroot.*                       skip            # xfs specific; obsolete
user.Beagle.*                   skip            # ignore Beagle index data
security.evm                    skip            # may only be written by kernel
afs.*                           skip            # AFS metadata and ACLs
lustre.*                        skip

Having Samba honor the xattr.conf (even if it is not using libattr) would at least make the behavior consistent between Samba and cp and other tools.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20220923/454ae025/attachment-0001.htm>


More information about the lustre-discuss mailing list