[lustre-discuss] Lustre crash and now lockup on ls -la /lustre
Sid Young
sid.young at gmail.com
Wed Feb 22 21:53:48 PST 2023
Hi all,
I've been running lustre 2.12.6 and (clients are 2.12.7) on HP gear for
nearly 2 years and had an odd crash requiring a reboot of all nodes. I have
lustre /home and /lustre file systems and I've been able to remount them on
the clients after restarting the MGS/MDT and OSS nodes but on any client
when I do an ls -la on the /lustre file system it locks solid. The /home
appears to be OK for the directories and sub-directories I tested.
I am ver rusty on Lustre now but I logged into another node and ran the
following:
[root at n04 ~]# lfs check osts
home-OST0000-osc-ffff9f3b26547800 active.
home-OST0001-osc-ffff9f3b26547800 active.
home-OST0002-osc-ffff9f3b26547800 active.
home-OST0003-osc-ffff9f3b26547800 active.
lustre-OST0000-osc-ffff9efd1e392800 active.
lustre-OST0001-osc-ffff9efd1e392800 active.
lustre-OST0002-osc-ffff9efd1e392800 active.
lustre-OST0003-osc-ffff9efd1e392800 active.
lustre-OST0004-osc-ffff9efd1e392800 active.
lustre-OST0005-osc-ffff9efd1e392800 active.
[root at n04 ~]# lfs check mds
home-MDT0000-mdc-ffff9f3b26547800 active.
lustre-MDT0000-mdc-ffff9efd1e392800 active.
[root at n04 ~]# lfs check servers
home-OST0000-osc-ffff9f3b26547800 active.
home-OST0001-osc-ffff9f3b26547800 active.
home-OST0002-osc-ffff9f3b26547800 active.
home-OST0003-osc-ffff9f3b26547800 active.
lustre-OST0000-osc-ffff9efd1e392800 active.
lustre-OST0001-osc-ffff9efd1e392800 active.
lustre-OST0002-osc-ffff9efd1e392800 active.
lustre-OST0003-osc-ffff9efd1e392800 active.
lustre-OST0004-osc-ffff9efd1e392800 active.
lustre-OST0005-osc-ffff9efd1e392800 active.
home-MDT0000-mdc-ffff9f3b26547800 active.
lustre-MDT0000-mdc-ffff9efd1e392800 active.
[root at n04 ~]#
[root at n04 ~]# lfs df -h
UUID bytes Used Available Use% Mounted on
home-MDT0000_UUID 4.2T 217.5G 4.0T 6% /home[MDT:0]
home-OST0000_UUID 47.6T 42.5T 5.1T 90% /home[OST:0]
home-OST0001_UUID 47.6T 44.6T 2.9T 94% /home[OST:1]
home-OST0002_UUID 47.6T 41.9T 5.7T 88% /home[OST:2]
home-OST0003_UUID 47.6T 42.2T 5.4T 89% /home[OST:3]
filesystem_summary: 190.4T 171.2T 19.1T 90% /home
UUID bytes Used Available Use% Mounted on
lustre-MDT0000_UUID 5.0T 53.8G 4.9T 2% /lustre[MDT:0]
lustre-OST0000_UUID 47.6T 42.3T 5.3T 89% /lustre[OST:0]
lustre-OST0001_UUID 47.6T 41.8T 5.8T 88% /lustre[OST:1]
lustre-OST0002_UUID 47.6T 41.3T 6.3T 87% /lustre[OST:2]
lustre-OST0003_UUID 47.6T 42.3T 5.3T 89% /lustre[OST:3]
lustre-OST0004_UUID 47.6T 43.7T 3.9T 92% /lustre[OST:4]
lustre-OST0005_UUID 47.6T 40.1T 7.4T 85% /lustre[OST:5]
filesystem_summary: 285.5T 251.5T 34.0T 89% /lustre
[root at n04 ~]#
Is it worth remounting everything and hope crash recovery is working or is
there some specific checks I can make.
Sid Young
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230223/b2762ab7/attachment.htm>
More information about the lustre-discuss
mailing list