<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:inherit;
panose-1:2 11 6 4 2 2 2 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.contentpasted0
{mso-style-name:contentpasted0;}
p.xmsonormal, li.xmsonormal, div.xmsonormal
{mso-style-name:x_msonormal;
margin:0cm;
font-size:12.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle23
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-GB" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">Thank you so much, that has puzzled me for sometime now :).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="font-size:12.0pt;color:black">From:
</span></b><span style="font-size:12.0pt;color:black">Patrick Farrell <pfarrell@ddn.com><br>
<b>Date: </b>Tuesday, 14 March 2023 at 14:36<br>
<b>To: </b>lustre-discuss@lists.lustre.org <lustre-discuss@lists.lustre.org>, Marc O'Brien <Marc.OBrien@cruk.cam.ac.uk><br>
<b>Subject: </b>Re: Question regarding user access during recovery and journal replay<o:p></o:p></span></p>
</div>
<div>
<div>
<p style="background:white"><span class="contentpasted0"><span style="font-family:"inherit",serif;color:black">Marc,</span></span><span style="color:#242424"><o:p></o:p></span></p>
<p style="background:white"><span style="color:#242424"><o:p> </o:p></span></p>
<p style="background:white"><span class="contentpasted0"><span style="font-family:"inherit",serif;color:black">[Re-posting to the list...]</span></span><span style="color:#242424"><o:p></o:p></span></p>
</div>
<div>
<p style="background:white"><span class="contentpasted0"><span style="font-family:"inherit",serif;color:black"> </span></span><span style="color:#242424"><o:p></o:p></span></p>
</div>
<div>
<p style="background:white"><span class="contentpasted0"><span style="font-family:"inherit",serif;color:black">No, it’s fine to have interaction during those times. The system is designed to do that work online. Depending what you’re trying to do and what
you’re accessing, some client operations will experience delays, but that’s it. For example, during failover/recovery for a particular OST or MDT, no new IO to that target will complete. But the user programs will just wait - it’s safe to leave them running.</span></span><span style="color:#242424"><o:p></o:p></span></p>
</div>
<div>
<p style="background:white"><span class="contentpasted0"><span style="font-family:"inherit",serif;color:black"> </span></span><span style="color:#242424"><o:p></o:p></span></p>
</div>
<div>
<p style="background:white"><span class="contentpasted0"><span style="font-family:"inherit",serif;color:black">So recovery, etc, will show up to users as delays in some requests, but it’s safe to do with users accessing the system.</span></span><span style="color:#242424"><o:p></o:p></span></p>
</div>
<div>
<p style="background:white"><span class="contentpasted0"><span style="font-family:"inherit",serif;color:black"> </span></span><span style="color:#242424"><o:p></o:p></span></p>
</div>
<div>
<p style="background:white"><span class="contentpasted0"><span style="font-family:"inherit",serif;color:black">Regards,</span></span><span style="color:#242424"><o:p></o:p></span></p>
</div>
<div>
<p style="background:white"><span class="contentpasted0"><span style="font-family:"inherit",serif;color:black">Patrick</span></span><span style="color:#242424"><o:p></o:p></span></p>
</div>
</div>
<div class="MsoNormal" align="center" style="text-align:center">
<hr size="0" width="94%" align="center">
</div>
<div id="divRplyFwdMsg">
<p class="MsoNormal"><b><span style="color:black">From:</span></b><span style="color:black"> lustre-discuss <lustre-discuss-bounces@lists.lustre.org> on behalf of Marc O'Brien via lustre-discuss <lustre-discuss@lists.lustre.org><br>
<b>Sent:</b> Tuesday, March 14, 2023 7:24 AM<br>
<b>To:</b> lustre-discuss@lists.lustre.org <lustre-discuss@lists.lustre.org><br>
<b>Subject:</b> [lustre-discuss] Question regarding user access during recovery and journal replay</span>
<o:p></o:p></p>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black">Hi,</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black">When I was first taught some Lustre file system administration, it was stressed that when recovering a Lustre file system and while the journal replay was occurring on each host, there should
be no user interaction with the file system. Any recovery was done with cluster access denied to HPC users, or when the cluster was deemed to be quiescent. This seemed to make sense as during journal replay the file system is in R/W state, but the distributed
file system may not have reached a stable state. We now have multiple Lustre file systems (2 Ext4 based and 1 ZFS based) and evicting users or finding a quiescent time is problematic (luckily there are maintenance windows for the routine stuff).</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black">I have searched online and have yet to see in print that there should be no user interaction with Lustre during recovery or journal replay (I may have missed it).</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black">So, my question is, is the no cluster user interaction during recovery and journal replay restriction, actually a thing?</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black">Thanks in advance for any enlightenment :)</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:black">Marc</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
</div>
</div>
</div>
</body>
</html>