[lustre-discuss] Lustre crash and now lockup on ls -la /lustre

Sid Young sid.young at gmail.com
Wed Feb 22 21:53:48 PST 2023


Hi all,

I've been running lustre 2.12.6 and (clients are 2.12.7) on HP gear for
nearly 2 years and had an odd crash requiring a reboot of all nodes. I have
lustre /home and /lustre file systems and I've been able to remount them on
the clients after restarting the MGS/MDT and OSS nodes but on any client
when I do an ls -la on the /lustre file system it locks solid. The /home
appears to be OK for the directories and sub-directories I tested.

I am ver rusty on Lustre now but I logged into another node and ran the
following:

[root at n04 ~]# lfs check osts
home-OST0000-osc-ffff9f3b26547800 active.
home-OST0001-osc-ffff9f3b26547800 active.
home-OST0002-osc-ffff9f3b26547800 active.
home-OST0003-osc-ffff9f3b26547800 active.
lustre-OST0000-osc-ffff9efd1e392800 active.
lustre-OST0001-osc-ffff9efd1e392800 active.
lustre-OST0002-osc-ffff9efd1e392800 active.
lustre-OST0003-osc-ffff9efd1e392800 active.
lustre-OST0004-osc-ffff9efd1e392800 active.
lustre-OST0005-osc-ffff9efd1e392800 active.
[root at n04 ~]# lfs check mds
home-MDT0000-mdc-ffff9f3b26547800 active.
lustre-MDT0000-mdc-ffff9efd1e392800 active.
[root at n04 ~]# lfs check servers
home-OST0000-osc-ffff9f3b26547800 active.
home-OST0001-osc-ffff9f3b26547800 active.
home-OST0002-osc-ffff9f3b26547800 active.
home-OST0003-osc-ffff9f3b26547800 active.
lustre-OST0000-osc-ffff9efd1e392800 active.
lustre-OST0001-osc-ffff9efd1e392800 active.
lustre-OST0002-osc-ffff9efd1e392800 active.
lustre-OST0003-osc-ffff9efd1e392800 active.
lustre-OST0004-osc-ffff9efd1e392800 active.
lustre-OST0005-osc-ffff9efd1e392800 active.
home-MDT0000-mdc-ffff9f3b26547800 active.
lustre-MDT0000-mdc-ffff9efd1e392800 active.
[root at n04 ~]#

[root at n04 ~]# lfs df -h
UUID                       bytes        Used   Available Use% Mounted on
home-MDT0000_UUID           4.2T      217.5G        4.0T   6% /home[MDT:0]
home-OST0000_UUID          47.6T       42.5T        5.1T  90% /home[OST:0]
home-OST0001_UUID          47.6T       44.6T        2.9T  94% /home[OST:1]
home-OST0002_UUID          47.6T       41.9T        5.7T  88% /home[OST:2]
home-OST0003_UUID          47.6T       42.2T        5.4T  89% /home[OST:3]

filesystem_summary:       190.4T      171.2T       19.1T  90% /home

UUID                       bytes        Used   Available Use% Mounted on
lustre-MDT0000_UUID         5.0T       53.8G        4.9T   2% /lustre[MDT:0]
lustre-OST0000_UUID        47.6T       42.3T        5.3T  89% /lustre[OST:0]
lustre-OST0001_UUID        47.6T       41.8T        5.8T  88% /lustre[OST:1]
lustre-OST0002_UUID        47.6T       41.3T        6.3T  87% /lustre[OST:2]
lustre-OST0003_UUID        47.6T       42.3T        5.3T  89% /lustre[OST:3]
lustre-OST0004_UUID        47.6T       43.7T        3.9T  92% /lustre[OST:4]
lustre-OST0005_UUID        47.6T       40.1T        7.4T  85% /lustre[OST:5]

filesystem_summary:       285.5T      251.5T       34.0T  89% /lustre

[root at n04 ~]#

Is it worth remounting everything and hope crash recovery is working or is
there some specific checks I can make.



Sid Young
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230223/b2762ab7/attachment.htm>


More information about the lustre-discuss mailing list