<div dir="ltr">What errors are indicated in the kernel ring buffer on the client (dmesg) ?<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Feb 22, 2023 at 10:56 PM Sid Young via lustre-discuss <<a href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi all,<div><br></div><div>I've been running lustre 2.12.6 and (clients are 2.12.7) on HP gear for nearly 2 years and had an odd crash requiring a reboot of all nodes. I have lustre /home and /lustre file systems and I've been able to remount them on the clients after restarting the MGS/MDT and OSS nodes but on any client when I do an ls -la on the /lustre file system it locks solid. The /home appears to be OK for the directories and sub-directories I tested.</div><div><br></div><div>I am ver rusty on Lustre now but I logged into another node and ran the following:</div><div><br></div><div>[root@n04 ~]# lfs check osts<br>home-OST0000-osc-ffff9f3b26547800 active.<br>home-OST0001-osc-ffff9f3b26547800 active.<br>home-OST0002-osc-ffff9f3b26547800 active.<br>home-OST0003-osc-ffff9f3b26547800 active.<br>lustre-OST0000-osc-ffff9efd1e392800 active.<br>lustre-OST0001-osc-ffff9efd1e392800 active.<br>lustre-OST0002-osc-ffff9efd1e392800 active.<br>lustre-OST0003-osc-ffff9efd1e392800 active.<br>lustre-OST0004-osc-ffff9efd1e392800 active.<br>lustre-OST0005-osc-ffff9efd1e392800 active.<br>[root@n04 ~]# lfs check mds<br>home-MDT0000-mdc-ffff9f3b26547800 active.<br>lustre-MDT0000-mdc-ffff9efd1e392800 active.<br>[root@n04 ~]# lfs check servers<br>home-OST0000-osc-ffff9f3b26547800 active.<br>home-OST0001-osc-ffff9f3b26547800 active.<br>home-OST0002-osc-ffff9f3b26547800 active.<br>home-OST0003-osc-ffff9f3b26547800 active.<br>lustre-OST0000-osc-ffff9efd1e392800 active.<br>lustre-OST0001-osc-ffff9efd1e392800 active.<br>lustre-OST0002-osc-ffff9efd1e392800 active.<br>lustre-OST0003-osc-ffff9efd1e392800 active.<br>lustre-OST0004-osc-ffff9efd1e392800 active.<br>lustre-OST0005-osc-ffff9efd1e392800 active.<br>home-MDT0000-mdc-ffff9f3b26547800 active.<br>lustre-MDT0000-mdc-ffff9efd1e392800 active.<br>[root@n04 ~]#</div><div><br></div><div>[root@n04 ~]# lfs df -h<br>UUID bytes Used Available Use% Mounted on<br>home-MDT0000_UUID 4.2T 217.5G 4.0T 6% /home[MDT:0]<br>home-OST0000_UUID 47.6T 42.5T 5.1T 90% /home[OST:0]<br>home-OST0001_UUID 47.6T 44.6T 2.9T 94% /home[OST:1]<br>home-OST0002_UUID 47.6T 41.9T 5.7T 88% /home[OST:2]<br>home-OST0003_UUID 47.6T 42.2T 5.4T 89% /home[OST:3]<br><br>filesystem_summary: 190.4T 171.2T 19.1T 90% /home<br><br>UUID bytes Used Available Use% Mounted on<br>lustre-MDT0000_UUID 5.0T 53.8G 4.9T 2% /lustre[MDT:0]<br>lustre-OST0000_UUID 47.6T 42.3T 5.3T 89% /lustre[OST:0]<br>lustre-OST0001_UUID 47.6T 41.8T 5.8T 88% /lustre[OST:1]<br>lustre-OST0002_UUID 47.6T 41.3T 6.3T 87% /lustre[OST:2]<br>lustre-OST0003_UUID 47.6T 42.3T 5.3T 89% /lustre[OST:3]<br>lustre-OST0004_UUID 47.6T 43.7T 3.9T 92% /lustre[OST:4]<br>lustre-OST0005_UUID 47.6T 40.1T 7.4T 85% /lustre[OST:5]<br><br>filesystem_summary: 285.5T 251.5T 34.0T 89% /lustre<br><br>[root@n04 ~]#<br></div><div><br></div><div>Is it worth remounting everything and hope crash recovery is working or is there some specific checks I can make.</div><div><br clear="all"><div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div><br></div><div><br></div><div>Sid Young<br></div></div></div></div></div></div></div></div></div></div></div></div>
_______________________________________________<br>
lustre-discuss mailing list<br>
<a href="mailto:lustre-discuss@lists.lustre.org" target="_blank">lustre-discuss@lists.lustre.org</a><br>
<a href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org" rel="noreferrer" target="_blank">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><br>
</blockquote></div>