<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body dir="auto">
You need to run writeconf on all targets at the same time, and mount in a specific order. That is documented in th Lustre Operations Manual. <br>
<br>
<div dir="ltr">Cheers, Andreas</div>
<div dir="ltr"><br>
<blockquote type="cite">On Jan 18, 2023, at 03:49, Edmondson, Edward via lustre-discuss <lustre-discuss@lists.lustre.org> wrote:<br>
<br>
</blockquote>
</div>
<blockquote type="cite">
<div dir="ltr">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style>@font-face { font-family: "Cambria Math"; }
@font-face { font-family: Calibri; }
p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif; }
.MsoChpDefault { font-size: 10pt; }
@page WordSection1 { size: 612pt 792pt; margin: 72pt; }
div.WordSection1 { page: WordSection1; }</style>
<div class="WordSection1">
<div name="messageBodySection">
<div>
<p class="MsoNormal">Hi all,<br>
<br>
I'm struggling to get my OSS mounts online after a less than clean shutdown. I'm on lustre 2.12.9. Plenty of googling etc doesn’t bring up anything that seems particular to the problem I’m having unfortunately.<br>
<br>
lnet seems to be up, pings ok both ways, communications clearly happen between the nodes judging by the logs. I've been through the log reconfiguration process with --writeconf on everything, step by step as in the manual<br>
<br>
On the OSS node when I try to mount:<br>
<span style="font-family:"Courier New"">mount.lustre: mount /dev/mapper/lustre-oss0 at /mnt/oss0 failed: No such file or directory<br>
Is the MGS specification correct?<br>
Is the filesystem name correct?<br>
If upgrading, is the copied client log valid? (see upgrade docs)<br>
<br>
</span><o:p></o:p></p>
<p class="MsoNormal">In logs:<br>
<span style="font-family:"Courier New"">Jan 18 10:27:56 nas-0-4 kernel: LustreError: 31015:0:(ldlm_lib.c:494:client_obd_setup()) can't add initial connection<br>
Jan 18 10:27:56 nas-0-4 kernel: LustreError: 31015:0:(lwp_dev.c:125:lwp_setup()) lustre-MDT0000-lwp-OST0000: client obd setup error: rc = -2<br>
Jan 18 10:27:56 nas-0-4 kernel: LustreError: 31015:0:(lwp_dev.c:273:lwp_init0()) lustre-MDT0000-lwp-OST0000: setup lwp failed. -2<br>
Jan 18 10:27:56 nas-0-4 kernel: LustreError: 31015:0:(obd_config.c:559:class_setup()) setup lustre-MDT0000-lwp-OST0000 failed (-2)<br>
Jan 18 10:27:56 nas-0-4 kernel: LustreError: 31015:0:(obd_mount.c:202:lustre_start_simple()) lustre-MDT0000-lwp-OST0000 setup error -2<br>
Jan 18 10:27:56 nas-0-4 kernel: LustreError: 31015:0:(obd_mount_server.c:671:lustre_lwp_setup()) lustre-MDT0000-lwp-OST0000: setup up failed: rc -2<br>
Jan 18 10:27:56 nas-0-4 kernel: LustreError: 15c-8: MGC10.3.255.200@o2ib: The configuration from log 'lustre-client' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog
for more information.<br>
Jan 18 10:27:56 nas-0-4 kernel: LustreError: 30961:0:(obd_mount_server.c:1414:server_start_targets()) lustre-OST0000: failed to start LWP: -2<br>
Jan 18 10:27:56 nas-0-4 kernel: LustreError: 30961:0:(obd_mount_server.c:1992:server_fill_super()) Unable to start targets: -2<br>
Jan 18 10:27:56 nas-0-4 kernel: Lustre: Failing over lustre-OST0000<br>
Jan 18 10:27:57 nas-0-4 kernel: LustreError: 30961:0:(ldlm_lockd.c:3203:ldlm_cleanup()) ldlm still has namespaces; clean these up first.<br>
Jan 18 10:27:57 nas-0-4 kernel: LustreError: 30961:0:(ldlm_lockd.c:2862:ldlm_put_ref()) ldlm_cleanup failed: -16<br>
Jan 18 10:27:57 nas-0-4 kernel: Lustre: server umount lustre-OST0000 complete<br>
Jan 18 10:27:57 nas-0-4 kernel: LustreError: 30961:0:(obd_mount.c:1604:lustre_fill_super()) Unable to mount (-2)<br>
</span><br>
On the MGS/MDT node (which has now mounted the MGS and MDT fine):<br>
<span style="font-family:"Courier New"">Jan 18 10:27:56 nas-0-3 kernel: Lustre: MGS: Connection restored to 24758df3-a11a-f5db-18a5-2e0e35f2099d (at 10.3.255.199@o2ib)<br>
Jan 18 10:27:56 nas-0-3 kernel: Lustre: MGS: Regenerating lustre-OST0000 log by user request: rc = 0<br>
Jan 18 10:27:56 nas-0-3 kernel: Lustre: Found index 0 for lustre-OST0000, updating log<br>
Jan 18 10:27:56 nas-0-3 kernel: Lustre: Client log for lustre-OST0000 was not updated; writeconf the MDT first to regenerate it.<br>
</span><br>
The MDT has absolutely been writeconfed so that last message isn't terribly helpful. fscks are clean, so there's not a problem there.<br>
<br>
Any advice hugely appreciated!<o:p></o:p></p>
</div>
</div>
<div name="messageSignatureSection">
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">-- <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Dr Edd Edmondson<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">HPC Systems Manager<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Dept of Physics and Astronomy<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">University College London<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">(he/him) During remote working email is the best way to contact me. If needed I am available by phone on 0203 108 1399, by Microsoft Teams, or other methods by arrangement.<o:p></o:p></p>
</div>
</div>
</div>
</div>
<span>_______________________________________________</span><br>
<span>lustre-discuss mailing list</span><br>
<span>lustre-discuss@lists.lustre.org</span><br>
<span>http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</span><br>
</div>
</blockquote>
</body>
</html>