<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
On Feb 27, 2023, at 11:57, Grigory Shamov <<a href="mailto:Grigory.Shamov@umanitoba.ca" class="">Grigory.Shamov@umanitoba.ca</a>> wrote:<br class="">
<div>
<blockquote type="cite" class=""><br class="Apple-interchange-newline">
<div class="">
<div class="">Hi All,<br class="">
<br class="">
What happens if a directory on Lustre FS gets moved with a regular CentOS7 mv command, within the same filesystem? On CentOS 7, using mv from the distro, like this, as root:<br class="">
<br class="">
mv /project/TEMP/user /project/XYZ/user<br class="">
<br class="">
It looks like the content gets copied entirely. Which for large data takes a large amount of time.<br class="">
Is there a way to rename the Lustre directories (changing the name of the top directory, only without moving every object in these directories)? Thanks!<br class="">
</div>
</div>
</blockquote>
<br class="">
</div>
<div>Renaming a file or subdirectory tree between "regular" directories in Lustre works as you would expect for a local filesystem, even if the directories are on different MDTs. What you are seeing (full copy of contents between directories) is really a result
of the implementation/design of project quotas, and not directly a Lustre problem. The same would happen if you have two directories using two different project IDs and the "PROJINHERIT" flag set with ext4 or XFS, since they also return "-EXDEV" if trying
to move (rename) a file between directories that do not have the same project ID, and that causes "mv" to copy the whole directory tree.</div>
<div><br class="">
</div>
<div>Running the ext4 "mv" under strace shows this:</div>
<div><br class="">
</div>
<div> # df -T /mnt/tmp</div>
Filesystem Type 1K-blocks Used Available Use% Mounted on<br class="">
/dev/mapper/vg_test-lvtest ext4 16337788 52 15482492 1% /mnt/tmp
<div class=""> # mkdir /mnt/tmp/{dir1,dir2}</div>
<div class=""> # chattr -P -p 1000 /mnt/tmp/dir1</div>
<div class=""> # chattr -P -p 2000 /mnt/tmp/dir2</div>
<div class=""> # cp /etc/hosts /mnt/tmp/dir1</div>
<div class=""> # lsattr /mnt/tmp/dir1<br class="">
--------------e----P-- /mnt/tmp/dir1/hosts</div>
<div class=""> # ls -li /mnt/tmp/dir1<br class="">
total 8<br class="">
655365 8 -rw-r--r--. 1 root root 7424 Oct 18 22:42 hosts</div>
<div class=""> # strace mv /mnt/tmp/dir1/hosts /mnt/tmp/dir2/hosts</div>
<div class=""> :<br class="">
<div> renameat2(AT_FDCWD, "/mnt/tmp/dir1/hosts", AT_FDCWD, "/mnt/tmp/dir2/hosts", RENAME_NOREPLACE) = -1 EXDEV (Invalid cross-device link)<br class="">
stat("/mnt/tmp/dir2/hosts", 0x7ffff8a6c2b0) = -1 ENOENT (No such file or directory)<br class="">
lstat("/mnt/tmp/dir1/hosts", {st_mode=S_IFREG|0644, st_size=7424, ...}) = 0<br class="">
newfstatat(AT_FDCWD, "/mnt/tmp/dir2/hosts", 0x7ffff8a6bf90, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)<br class="">
unlink("/mnt/tmp/dir2/hosts") = -1 ENOENT (No such file or directory)<br class="">
openat(AT_FDCWD, "/mnt/tmp/dir1/hosts", O_RDONLY|O_NOFOLLOW) = 3<br class="">
openat(AT_FDCWD, "/mnt/tmp/dir2/hosts", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4<br class="">
read(3, "##\n# Host Database\n#\n# Do not re"..., 131072) = 7424<br class="">
write(4, "##\n# Host Database\n#\n# Do not re"..., 7424) = 7424</div>
<div> :</div>
<div> # lsattr -p /mnt/tmp/dir2</div>
<div> 2000 --------------e----P-- /mnt/tmp/dir2/hosts</div>
<div> # ls -li /mnt/tmp/dir2</div>
total 8</div>
<div class=""> 786435 8 -rw-r--r--. 1 root root 7424 Oct 18 22:42 hosts
<div><br class="">
</div>
<div>The reason for this limitation is that there is no way to atomically update the quota between the two project IDs when a whole subdirectory tree is being moved between projects. There might be thousands of subdirectories and millions of files that are
being moved, and the project ID needs to be updated on all of those files and directories. This is too large to do atomically in a single filesystem transaction. Rather than try to solve this directly in the kernel, the decision of the XFS developers (copied
by ext4) is that cross-project renames will not be done by the kernel and instead be handled in userspace by the "mv" utility, the same way that renames across different filesystems are handled.</div>
<div><br class="">
</div>
<div><br class="">
</div>
<div>In Lustre 2.15.0 and later, this cross-project rename constraint has been removed for *regular file* renames between directories with different project IDs. This means the file is moved between directories and the project ID and associated quota accounting
is updated in a single transaction without doing a data copy. However, *directory* renames with PROJINHERIT still have this issue.</div>
<div><br class="">
</div>
<div>To work around this behavior, it is possible to use "chattr - p" (or "lfs project -p", they do the same thing) to change the project ID of the source files and directories *before* they are renamed so that the file data copy does not need to be done, and
just the filenames can be moved.</div>
<div><br class="">
</div>
<div>It might be possible to patch "mv" so that instead of bailing on "rename()" after the first EXDEV return, it creates the target directory and then tries to rename the files within the source directory to the target, before it does the file copy. It is
likely that ext4 could also be patched to allow regular file renames without returning EXDEV.</div>
<div><br class="">
</div>
<div class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div>Cheers, Andreas</div>
<div>--</div>
<div>Andreas Dilger</div>
<div>Lustre Principal Architect</div>
<div>Whamcloud</div>
<div><br class="">
</div>
<div><br class="">
</div>
<div><br class="">
</div>
</div>
</div>
</div>
</div>
</div>
<br class="Apple-interchange-newline">
</div>
<br class="Apple-interchange-newline">
<br class="Apple-interchange-newline">
</div>
<br class="">
</div>
</body>
</html>