<html>


<head>


<meta http-equiv="Content-Type" content="text/html; charset=utf-8">


<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>


</head>


<body dir="ltr">


<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">


Obviously Tim would have to speak to this if he can, but that's not the way things worked at OCI and I would think it's the same at all the hyperscalers - there's no such thing as idle time, not really, or at least not like this.  They work very hard to minimize


 idle across the (many, many) datacenters/nodes and time is absolutely charged for internal use (perhaps charged differently, but still).  Plenty of people would love "idle" time, so there isn't any.</div>


<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">


<br>


</div>


<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">


-Patrick</div>


<div id="appendonsend"></div>


<hr style="display:inline-block;width:98%" tabindex="-1">


<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> lustre-devel <lustre-devel-bounces@lists.lustre.org> on behalf of Oleg Drokin <green@whamcloud.com><br>


<b>Sent:</b> Tuesday, February 4, 2025 12:38 PM<br>


<b>To:</b> Andreas Dilger <adilger@ddn.com><br>


<b>Cc:</b> lustre-devel@lists.lustre.org <lustre-devel@lists.lustre.org><br>


<b>Subject:</b> Re: [lustre-devel] [LSF/MM/BPF TOPIC] [DRAFT] Lustre client upstreaming</font>


<div> </div>


</div>


<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">


<div class="PlainText">On Tue, 2025-02-04 at 17:33 +0000, Andreas Dilger wrote:<br>


> You overlook that Tim works for AWS, so he would not actually pay to<br>


> run these nodes. He could run in machine idle times while no external<br>


> customer is paying for them. <br>


<br>


If this could be arranged that would be great of course, but I don't<br>


want to assume something of this nature unless explicitly stated. And<br>


who knows what sort of internal accounting there might be in place to<br>


keep track (and approve) uses like this too.<br>


<br>


> I suspect with the random nature of the boilpot that it is the total<br>


> number of hours runtime that matter, not whether they are contiguous<br>


> or not.  So running 24x boilpot nodes for 1h during off-peak times<br>


> would likely produce the same result as 24h continuous on one node. <br>


<br>


Well, not exactly true. There need to be continuous chunks of at least<br>


1x the longest testrun and preferably much more (2x is better as the<br>


minimum?).<br>


If conf-sanity takes 5 hours in this setup (cpu overcommit making<br>


things slow and whatnot) and you always only run for an hour - we never<br>


get to try most of conf-sanity.<br>


<br>


Also 50 sessions of conf-sanity running in parallel 1x vs<br>


10 sessions running conf-sanity in parallel 5x - the latter probably<br>


wins coverage wise because over time the other conflicting VMs would<br>


deviate more so the stress points in the code would fall more and more<br>


differently, I suspect (but we can probably test this by running both<br>


setups for long enough in parallel on the same code and see how much of<br>


a crash rate difference it makes)<br>


<br>


> <br>


> Cheers, Andreas<br>


> <br>


> > On Feb 3, 2025, at 15:30, Oleg Drokin <green@whamcloud.com> wrote:<br>


> > <br>


> > On Mon, 2025-02-03 at 20:24 +0000, Oleg Drokin wrote:<br>


> > <br>


> > > at $11/hour the m7a.metal-48xl would take $264 to run for just<br>


> > > one<br>


> > > day,<br>


> > > a week is an eye-watering $1848, so running this for every patch<br>


> > > is<br>


> > > not<br>


> > > super economical I'd say.<br>


> > <br>


> > x2gd metal at $5.34 per hour makes more sense as it has more RAM<br>


> > (and<br>


> > 64 CPUs is adequate I'd say) but still quite pricey if you want to<br>


> > run<br>


> > this at any sort of scale.<br>


> > _______________________________________________<br>


> > lustre-devel mailing list<br>


> > lustre-devel@lists.lustre.org<br>


> > <a href="http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org">http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org</a><br>


<br>


_______________________________________________<br>


lustre-devel mailing list<br>


lustre-devel@lists.lustre.org<br>


<a href="http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org">http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org</a><br>


</div>


</span></font></div>


</body>


</html>