[lustre-discuss] lproc stats changed snapshot_time from unix-epoch to uptime/monotonic in 2.15

Ellis Wilson elliswilson at microsoft.com
Thu Aug 25 06:20:53 PDT 2022


Thanks for confirming Andreas, and will do!

-----Original Message-----
From: Andreas Dilger <adilger at whamcloud.com> 
Sent: Wednesday, August 24, 2022 8:47 PM
To: Ellis Wilson <elliswilson at microsoft.com>
Cc: lustre-discuss at lists.lustre.org
Subject: [EXTERNAL] Re: [lustre-discuss] lproc stats changed snapshot_time from unix-epoch to uptime/monotonic in 2.15

Ellis, thanks for reporting this.  This looks like it was a mistake. 

The timestamps should definitely be in wallclock time, but this looks to have been changed unintentionally to reduce overhead, and use a u64 instead of dealing with timespec64 math, while losing the original intent (there are many different ktime_get variants, all alike).

I think many monitoring tools will be unaffected because they use the delta between successive timestamps, but having timestamps that are relative to boot time is problematic since they may repeat or go backward after a reboot, and some tools may use this timestamp when inserting into a tsdb to avoid processing lag. 

Please file a ticket, and ideally if you can submit a patch that converts ktime_get() to ktime_get_real_ns() for the places that are changed in the patch (with a "Fixes:" line to track it against the original patch, which was commit ea2cd3af7b).

Cheers, Andreas

> On Aug 24, 2022, at 14:50, Ellis Wilson via lustre-discuss <lustre-discuss at lists.lustre.org> wrote:
> 
> Hi all,
> 
> One of my colleagues noticed that in testing 2.15.1 out the stats returned include snapshot_time showing up in a different fashion than before.  Previously, ktime_get_real_ts64 was used to get the current timestamp and that was presented when stats were printed, whereas now uptime is used as returned by ktime_get.  Is there a monotonic requirement to snapshot_time that I'm not thinking about that makes ktime_get more useful?  The previous behavior of getting the current time alongside the stats so you could reason about when they were gotten made more sense to me.  But perhaps Andreas had a different vision for use of snapshot_time given that he was the one who revised it?
> 
> I'm glad to open a bug and propose a patch if this was just a mistake, but figured I'd ask first.
> 
> Best,
> 
> ellis
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&data=05%7C01%7Celliswilson%40microsoft.com%7C211612b72e78476e056008da8633576a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637969852332016835%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Ycw4J8CUQ1WC9c96G2B0ko1gwPO1A4sj9ThFz3xuxuQ%3D&reserved=0


More information about the lustre-discuss mailing list