[lustre-devel] Lustre log question(s)

Andreas Dilger adilger at whamcloud.com
Mon Feb 1 20:14:57 PST 2021


It is worthwhile to note that the proposed scenario is racy even for local filesystems, regardless of whether recovery is involved or not.

If (1) and (3) are happening on two clients at the same time:
- if (1) happens first, then (3) will fail with ENOENT ("no such file or directory") because "dir1" is gone.
- if (3) happens first, then (1) may delete "file100", but that is not the filesystem's fault, the user on cli1 asked for everything in "dir1" to be deleted.
- in some cases, "file100" may be created after (1) has passed that part of the directory traversal (it depends on how large the tree under "dir1" is), and then (3) will fail the final rmdir("dir1") with EBUSY ("directory is not empty") because "file100" still exists there.  This is the classic "TOCTOU" race ("Time of Creation/Time of Use").

Cheers, Andreas

On Feb 1, 2021, at 11:18, Spitz, Cory James <cory.spitz at hpe.com<mailto:cory.spitz at hpe.com>> wrote:

> I am trying to get a 30,000 ft overview of how lustre replay/recovery works

This old slide deck might be useful to you:
https://wiki.lustre.org/images/0/00/A_Deep_Dive_into_Lustre_Recovery_Mechanisms.pdf

Granted, it may not be 100% correct any longer.

-Cory

On 1/30/21, 11:08 AM, "lustre-devel" <lustre-devel-bounces at lists.lustre.org<mailto:lustre-devel-bounces at lists.lustre.org>> wrote:

Thank you for the explanation on LLOG and changelog.  With respect to the following statement :

>> Lustre has its own mechanisms to guarantee transaction are committed to disk and handle crash. Basicly, I/O are not acknowledge to Lustre clients before the data is actually on disk. In case of server crash, the Lustre client will replay all non-acknowledge I/Os to ensure none of them are lost.

For example:

Let us say that I have 4 clients (cli1, cli2, cli3 and cli4) and all are writing and reading data.  I have 1 host with 4 disks (2 OSTs, 1 MDT, 1 MGT).
1.       cli1 issues a directory remove (rm -rf /mnt/lustre/dir1)
2.       cli1 loses connection with Lustre targets.
3.       cli2 wants to now create a file under /mnt/lustre/dir1/file100 and write some data to file100
All of these are happening in parallel.
•         Does cli2 get an error that /mnt/lustre/dir1 has been removed and it has to first issue additional I/O to create /mnt/lustre/dir1 before reissuing the I/O to write file100 ?
•         If a transaction from cli2 happens before cli1, then this would lead to data lost situation for cli2, if cli2 tries to read/write data from/to file100 after sometime.
•         What is the role of last_rcvd file in this entire picture ?
I am trying to get a 30,000 ft overview of how lustre replay/recovery works.

Thanks again and appreciate your timely response.

On Fri, Jan 29, 2021 at 1:22 AM Degremont, Aurelien <degremoa at amazon.com<mailto:degremoa at amazon.com>> wrote:
Hi,

This is not totally correct.

First, LLOG is the underlying technology used to store and handle Lustre Changelogs. But LLOG is used for other Lustre mechanisms, like lustre configuration.
Second, Changelog is similar to an audit feature. Changelog only logs different filesystem change, mostly metadata change, but definitely not the file content change. They don't play a role at all in transaction or failure recovery. This is only an admin feature.

At the end, indeed ZIL cannot be used and Lustre has its own mechanisms to guarantee transaction are committed to disk and handle crash. Basicly, I/O are not acknowledge to Lustre clients before the data is actually on disk. In case of server crash, the Lustre client will replay all non-acknowledge I/Os to ensure none of them are lost.

Changelog is not needed in your case.

Aurélien

De : lustre-devel <lustre-devel-bounces at lists.lustre.org<mailto:lustre-devel-bounces at lists.lustre.org>> au nom de Sudheendra Sampath <sudheendra.sampath at gmail.com<mailto:sudheendra.sampath at gmail.com>>
Date : jeudi 28 janvier 2021 à 21:43
À : "lustre-devel at lists.lustre.org<mailto:lustre-devel at lists.lustre.org>" <lustre-devel at lists.lustre.org<mailto:lustre-devel at lists.lustre.org>>
Objet : [EXTERNAL] [lustre-devel] Lustre log question(s)


CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.


Hi,

I am trying to evaluate osd-zfs based MDS and OST deployment on a 2 node setup.

I have the following questions about Lustre log:
1.       Is changelog and llog both the same, in the sense are they synonymous with each other?
2.       I understand that ZIL is currently not supported in Lustre version 2.12.2.  My question is :
1.       My understanding is that transactions (in general) need some logging mechanism for it to work in 'all or none' scenarios.  Please correct me if my understanding is incorrect.   I understand that changelog has to be enabled so that filesystem changes are recorded to be replayed after a crash.  How does Lustre transactions work if there is no intent log/changelog ?
2.       Does it mean that if changelog is NOT enabled and there is a crash, we risk losing all changes/updates to the filesystem ?
Appreciate your timely response and Thank you for your help.

--
Regards

Sudheendra Sampath


--
Regards

Sudheendra Sampath
_______________________________________________
lustre-devel mailing list
lustre-devel at lists.lustre.org<mailto:lustre-devel at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20210202/376eaf4e/attachment-0001.html>


More information about the lustre-devel mailing list