[lustre-devel] Lustre log question(s)

Spitz, Cory James cory.spitz at hpe.com
Mon Feb 1 10:18:48 PST 2021


> I am trying to get a 30,000 ft overview of how lustre replay/recovery works

This old slide deck might be useful to you:
https://wiki.lustre.org/images/0/00/A_Deep_Dive_into_Lustre_Recovery_Mechanisms.pdf

Granted, it may not be 100% correct any longer.

-Cory

On 1/30/21, 11:08 AM, "lustre-devel" <lustre-devel-bounces at lists.lustre.org> wrote:

Thank you for the explanation on LLOG and changelog.  With respect to the following statement :

>> Lustre has its own mechanisms to guarantee transaction are committed to disk and handle crash. Basicly, I/O are not acknowledge to Lustre clients before the data is actually on disk. In case of server crash, the Lustre client will replay all non-acknowledge I/Os to ensure none of them are lost.

For example:

Let us say that I have 4 clients (cli1, cli2, cli3 and cli4) and all are writing and reading data.  I have 1 host with 4 disks (2 OSTs, 1 MDT, 1 MGT).
1.       cli1 issues a directory remove (rm -rf /mnt/lustre/dir1)
2.       cli1 loses connection with Lustre targets.
3.       cli2 wants to now create a file under /mnt/lustre/dir1/file100 and write some data to file100
All of these are happening in parallel.
·         Does cli2 get an error that /mnt/lustre/dir1 has been removed and it has to first issue additional I/O to create /mnt/lustre/dir1 before reissuing the I/O to write file100 ?
·         If a transaction from cli2 happens before cli1, then this would lead to data lost situation for cli2, if cli2 tries to read/write data from/to file100 after sometime.
·         What is the role of last_rcvd file in this entire picture ?
I am trying to get a 30,000 ft overview of how lustre replay/recovery works.

Thanks again and appreciate your timely response.

On Fri, Jan 29, 2021 at 1:22 AM Degremont, Aurelien <degremoa at amazon.com<mailto:degremoa at amazon.com>> wrote:
Hi,

This is not totally correct.

First, LLOG is the underlying technology used to store and handle Lustre Changelogs. But LLOG is used for other Lustre mechanisms, like lustre configuration.
Second, Changelog is similar to an audit feature. Changelog only logs different filesystem change, mostly metadata change, but definitely not the file content change. They don't play a role at all in transaction or failure recovery. This is only an admin feature.

At the end, indeed ZIL cannot be used and Lustre has its own mechanisms to guarantee transaction are committed to disk and handle crash. Basicly, I/O are not acknowledge to Lustre clients before the data is actually on disk. In case of server crash, the Lustre client will replay all non-acknowledge I/Os to ensure none of them are lost.

Changelog is not needed in your case.

Aurélien

De : lustre-devel <lustre-devel-bounces at lists.lustre.org<mailto:lustre-devel-bounces at lists.lustre.org>> au nom de Sudheendra Sampath <sudheendra.sampath at gmail.com<mailto:sudheendra.sampath at gmail.com>>
Date : jeudi 28 janvier 2021 à 21:43
À : "lustre-devel at lists.lustre.org<mailto:lustre-devel at lists.lustre.org>" <lustre-devel at lists.lustre.org<mailto:lustre-devel at lists.lustre.org>>
Objet : [EXTERNAL] [lustre-devel] Lustre log question(s)


CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

Hi,

I am trying to evaluate osd-zfs based MDS and OST deployment on a 2 node setup.

I have the following questions about Lustre log:
1.       Is changelog and llog both the same, in the sense are they synonymous with each other?
2.       I understand that ZIL is currently not supported in Lustre version 2.12.2.  My question is :
1.       My understanding is that transactions (in general) need some logging mechanism for it to work in 'all or none' scenarios.  Please correct me if my understanding is incorrect.   I understand that changelog has to be enabled so that filesystem changes are recorded to be replayed after a crash.  How does Lustre transactions work if there is no intent log/changelog ?
2.       Does it mean that if changelog is NOT enabled and there is a crash, we risk losing all changes/updates to the filesystem ?
Appreciate your timely response and Thank you for your help.

--
Regards

Sudheendra Sampath


--
Regards

Sudheendra Sampath
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20210201/5e685603/attachment-0001.html>


More information about the lustre-devel mailing list