[lustre-discuss] Does mtime/ctime have eventual consistency semantics in Lustre?

Fri Sep 7 03:32:57 PDT 2018

We run stress tests of Lustre 2.7 on Centos 6.6 inside a VirtualBox 5.2.
The tests include performing the following operations on the filesystem for
several minutes in random order:
- create directory/file/hard link/symlink
- remove existing path
- chmod path
- move path

Except chmod, all operations above are supposed to change ctime of the
parent directory. Before and after each operation the parent directory is
lstatted. In our tests we verify that ctime of the parent directory has
changed.
Below is the simplified Python pseudocode:

now_bef = time.time()
st_bef = os.lstat(parent_path)
run_fs_operation(path)
st_aft = os.lstat(parent_path)
now_aft = time.time()
assert (st_bef.st_ctime < st_aft.st_ctime) \
    or ((st_bef.st_ctime == st_aft.st_ctime) and (abs(int(now_bef) -
st_bef.st_ctime) <= 1)), \
    "ctime not updated; '%s' ts before: %s; ts after: %s; st before: %s; st
after: %s" % (path, now_bef, now_aft, st_bef, st_aft)

We tolerate 1-second difference between time.time() and ctime due to
https://lkml.org/lkml/2009/11/27/169.

We observed that occasionally mtime/ctime are not updated. For example, a
new directory was created at 1536278713 = 00:05:13, but its parent's ctime
was still 1536278707 = 00:05:07. As we can see the assertion was triggered:

AssertionError: ctime not updated;
'b'/mnt/lustre/dir_\xc5\x9b_0/dir_\xc5\x9b_1/dir_\xc5\x9b_0/dir_\xc5\x9b_1/dir_\xc5\x9b_0/dir_\xc5\x9b_1/dir_\xc5\x9b_1/dir_\xc5\x9b_1/dir_\xc5\x9b_1''
    ts before: 1536278713.0100942;
    ts after: 1536278713.0103905;
    st before: os.stat_result(
        st_mode=16877, st_ino=144115205255730395,
        st_dev=1264717536, st_nlink=2, st_uid=0, st_gid=0,
        st_size=4096,
        st_atime=1536278696,
        st_mtime=1536278707,
        st_ctime=1536278707);
    st after: os.stat_result(
        st_mode=16877, st_ino=144115205255730395,
        st_dev=1264717536, st_nlink=2, st_uid=0, st_gid=0,
        st_size=4096,
        st_atime=1536278696,
        st_mtime=1536278707,
        st_ctime=1536278707)

This particular error was caused by creating a new directory, but they
happen after other operations too.
As the assertion triggered from time to time, we have recently added a loop
that waits for parent's ctime to be updated. In this case it happened after
1 second:

ctime updated after 1 seconds;
'b'/mnt/lustre/dir_\xc5\x9b_0/dir_\xc5\x9b_1/dir_\xc5\x9b_0/dir_\xc5\x9b_1/dir_\xc5\x9b_0/dir_\xc5\x9b_1/dir_\xc5\x9b_1/dir_\xc5\x9b_1/dir_\xc5\x9b_1''
    ts before: 1536278713.0100942;
    ts now: 1536278714.0112765;
    st before: os.stat_result(st_mode=16877,
        st_ino=144115205255730395, st_dev=1264717536,
        st_nlink=2, st_uid=0, st_gid=0, st_size=4096,
        st_atime=1536278696,
        st_mtime=1536278707,
        st_ctime=1536278707);
    st now: os.stat_result(st_mode=16877, st_ino=144115205255730395,
        st_dev=1264717536, st_nlink=3, st_uid=0, st_gid=0,
        st_size=4096,
        st_atime=1536278696,
        st_mtime=1536278713,
        st_ctime=1536278713)

We saw cases where after more than a dozen seconds ctime was still wrong,
but got repaired on its own after more than a minute.

Lustre version: lustre-2.7.0-2.6.32_504.8.1.el6_lustre.x86_64.x86_64

What might be important is that we use aggressive time synchronization
between host and guess machine (VBoxService with --timesync-interval 1000).
Below is the snippet from VBoxService logs:

00:04:40.263572 timesync vgsvcTimeSyncWorker: Host:
2018-09-07T00:05:12.515000000Z    (MinAdjust: 100 ms)
00:04:40.263635 timesync vgsvcTimeSyncWorker: Guest: -
2018-09-07T00:05:11.120961000Z => 1 394 039 000 ns drift
00:04:40.263663 timesync vgsvcTimeSyncAdjust: adjtime by 1 394 039 000 ns
00:04:41.263879 timesync vgsvcTimeSyncWorker: Host:
2018-09-07T00:05:13.515000000Z    (MinAdjust: 100 ms)
00:04:41.263934 timesync vgsvcTimeSyncWorker: Guest: -
2018-09-07T00:05:12.121272000Z => 1 393 728 000 ns drift
00:04:41.263976 timesync vgsvcTimeSyncAdjust: adjtime by 1 393 728 000 ns
00:04:42.264138 timesync vgsvcTimeSyncWorker: Host:
2018-09-07T00:05:14.515000000Z    (MinAdjust: 100 ms)
00:04:42.264209 timesync vgsvcTimeSyncWorker: Guest: -
2018-09-07T00:05:13.121532000Z => 1 393 468 000 ns drift
00:04:42.264249 timesync vgsvcTimeSyncAdjust: adjtime by 1 393 468 000 ns
00:04:43.264406 timesync vgsvcTimeSyncWorker: Host:
2018-09-07T00:05:15.515000000Z    (MinAdjust: 100 ms)
00:04:43.264470 timesync vgsvcTimeSyncWorker: Guest: -
2018-09-07T00:05:14.121801000Z => 1 393 199 000 ns drift
00:04:43.264498 timesync vgsvcTimeSyncAdjust: adjtime by 1 393 199 000 ns
00:04:44.264631 timesync vgsvcTimeSyncWorker: Host:
2018-09-07T00:05:16.514000000Z    (MinAdjust: 100 ms)
00:04:44.264703 timesync vgsvcTimeSyncWorker: Guest: -
2018-09-07T00:05:15.122026000Z => 1 391 974 000 ns drift
00:04:44.264733 timesync vgsvcTimeSyncAdjust: adjtime by 1 391 974 000 ns

This is nothing unusual: similar adjustments are made in every test and in
most cases they don't cause any problems with ctime.

Why are mtime/ctime occasionally not updated? Is there a way to make sure
that metadata is fully up to date?

-- 
Best regards,
Marcin Barczyński
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180907/3498238f/attachment.html>