[lustre-discuss] Lustre crash and now lockup on ls -la /lustre

Colin Faber cfaber at gmail.com
Thu Feb 23 20:43:15 PST 2023


What errors are indicated in the kernel ring buffer on the client (dmesg) ?

On Wed, Feb 22, 2023 at 10:56 PM Sid Young via lustre-discuss <
lustre-discuss at lists.lustre.org> wrote:

> Hi all,
>
> I've been running lustre 2.12.6 and (clients are 2.12.7) on HP gear for
> nearly 2 years and had an odd crash requiring a reboot of all nodes. I have
> lustre /home and /lustre file systems and I've been able to remount them on
> the clients after restarting the MGS/MDT and OSS nodes but on any client
> when I do an ls -la on the /lustre file system it locks solid. The /home
> appears to be OK for the directories and sub-directories I tested.
>
> I am ver rusty on Lustre now but I logged into another node and ran the
> following:
>
> [root at n04 ~]# lfs check osts
> home-OST0000-osc-ffff9f3b26547800 active.
> home-OST0001-osc-ffff9f3b26547800 active.
> home-OST0002-osc-ffff9f3b26547800 active.
> home-OST0003-osc-ffff9f3b26547800 active.
> lustre-OST0000-osc-ffff9efd1e392800 active.
> lustre-OST0001-osc-ffff9efd1e392800 active.
> lustre-OST0002-osc-ffff9efd1e392800 active.
> lustre-OST0003-osc-ffff9efd1e392800 active.
> lustre-OST0004-osc-ffff9efd1e392800 active.
> lustre-OST0005-osc-ffff9efd1e392800 active.
> [root at n04 ~]# lfs check mds
> home-MDT0000-mdc-ffff9f3b26547800 active.
> lustre-MDT0000-mdc-ffff9efd1e392800 active.
> [root at n04 ~]# lfs check servers
> home-OST0000-osc-ffff9f3b26547800 active.
> home-OST0001-osc-ffff9f3b26547800 active.
> home-OST0002-osc-ffff9f3b26547800 active.
> home-OST0003-osc-ffff9f3b26547800 active.
> lustre-OST0000-osc-ffff9efd1e392800 active.
> lustre-OST0001-osc-ffff9efd1e392800 active.
> lustre-OST0002-osc-ffff9efd1e392800 active.
> lustre-OST0003-osc-ffff9efd1e392800 active.
> lustre-OST0004-osc-ffff9efd1e392800 active.
> lustre-OST0005-osc-ffff9efd1e392800 active.
> home-MDT0000-mdc-ffff9f3b26547800 active.
> lustre-MDT0000-mdc-ffff9efd1e392800 active.
> [root at n04 ~]#
>
> [root at n04 ~]# lfs df -h
> UUID                       bytes        Used   Available Use% Mounted on
> home-MDT0000_UUID           4.2T      217.5G        4.0T   6% /home[MDT:0]
> home-OST0000_UUID          47.6T       42.5T        5.1T  90% /home[OST:0]
> home-OST0001_UUID          47.6T       44.6T        2.9T  94% /home[OST:1]
> home-OST0002_UUID          47.6T       41.9T        5.7T  88% /home[OST:2]
> home-OST0003_UUID          47.6T       42.2T        5.4T  89% /home[OST:3]
>
> filesystem_summary:       190.4T      171.2T       19.1T  90% /home
>
> UUID                       bytes        Used   Available Use% Mounted on
> lustre-MDT0000_UUID         5.0T       53.8G        4.9T   2%
> /lustre[MDT:0]
> lustre-OST0000_UUID        47.6T       42.3T        5.3T  89%
> /lustre[OST:0]
> lustre-OST0001_UUID        47.6T       41.8T        5.8T  88%
> /lustre[OST:1]
> lustre-OST0002_UUID        47.6T       41.3T        6.3T  87%
> /lustre[OST:2]
> lustre-OST0003_UUID        47.6T       42.3T        5.3T  89%
> /lustre[OST:3]
> lustre-OST0004_UUID        47.6T       43.7T        3.9T  92%
> /lustre[OST:4]
> lustre-OST0005_UUID        47.6T       40.1T        7.4T  85%
> /lustre[OST:5]
>
> filesystem_summary:       285.5T      251.5T       34.0T  89% /lustre
>
> [root at n04 ~]#
>
> Is it worth remounting everything and hope crash recovery is working or is
> there some specific checks I can make.
>
>
>
> Sid Young
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230223/700a1779/attachment.htm>


More information about the lustre-discuss mailing list