[lustre-discuss] Lustre crash and now lockup on ls -la /lustre
Colin Faber
cfaber at gmail.com
Thu Feb 23 20:43:15 PST 2023
What errors are indicated in the kernel ring buffer on the client (dmesg) ?
On Wed, Feb 22, 2023 at 10:56 PM Sid Young via lustre-discuss <
lustre-discuss at lists.lustre.org> wrote:
> Hi all,
>
> I've been running lustre 2.12.6 and (clients are 2.12.7) on HP gear for
> nearly 2 years and had an odd crash requiring a reboot of all nodes. I have
> lustre /home and /lustre file systems and I've been able to remount them on
> the clients after restarting the MGS/MDT and OSS nodes but on any client
> when I do an ls -la on the /lustre file system it locks solid. The /home
> appears to be OK for the directories and sub-directories I tested.
>
> I am ver rusty on Lustre now but I logged into another node and ran the
> following:
>
> [root at n04 ~]# lfs check osts
> home-OST0000-osc-ffff9f3b26547800 active.
> home-OST0001-osc-ffff9f3b26547800 active.
> home-OST0002-osc-ffff9f3b26547800 active.
> home-OST0003-osc-ffff9f3b26547800 active.
> lustre-OST0000-osc-ffff9efd1e392800 active.
> lustre-OST0001-osc-ffff9efd1e392800 active.
> lustre-OST0002-osc-ffff9efd1e392800 active.
> lustre-OST0003-osc-ffff9efd1e392800 active.
> lustre-OST0004-osc-ffff9efd1e392800 active.
> lustre-OST0005-osc-ffff9efd1e392800 active.
> [root at n04 ~]# lfs check mds
> home-MDT0000-mdc-ffff9f3b26547800 active.
> lustre-MDT0000-mdc-ffff9efd1e392800 active.
> [root at n04 ~]# lfs check servers
> home-OST0000-osc-ffff9f3b26547800 active.
> home-OST0001-osc-ffff9f3b26547800 active.
> home-OST0002-osc-ffff9f3b26547800 active.
> home-OST0003-osc-ffff9f3b26547800 active.
> lustre-OST0000-osc-ffff9efd1e392800 active.
> lustre-OST0001-osc-ffff9efd1e392800 active.
> lustre-OST0002-osc-ffff9efd1e392800 active.
> lustre-OST0003-osc-ffff9efd1e392800 active.
> lustre-OST0004-osc-ffff9efd1e392800 active.
> lustre-OST0005-osc-ffff9efd1e392800 active.
> home-MDT0000-mdc-ffff9f3b26547800 active.
> lustre-MDT0000-mdc-ffff9efd1e392800 active.
> [root at n04 ~]#
>
> [root at n04 ~]# lfs df -h
> UUID bytes Used Available Use% Mounted on
> home-MDT0000_UUID 4.2T 217.5G 4.0T 6% /home[MDT:0]
> home-OST0000_UUID 47.6T 42.5T 5.1T 90% /home[OST:0]
> home-OST0001_UUID 47.6T 44.6T 2.9T 94% /home[OST:1]
> home-OST0002_UUID 47.6T 41.9T 5.7T 88% /home[OST:2]
> home-OST0003_UUID 47.6T 42.2T 5.4T 89% /home[OST:3]
>
> filesystem_summary: 190.4T 171.2T 19.1T 90% /home
>
> UUID bytes Used Available Use% Mounted on
> lustre-MDT0000_UUID 5.0T 53.8G 4.9T 2%
> /lustre[MDT:0]
> lustre-OST0000_UUID 47.6T 42.3T 5.3T 89%
> /lustre[OST:0]
> lustre-OST0001_UUID 47.6T 41.8T 5.8T 88%
> /lustre[OST:1]
> lustre-OST0002_UUID 47.6T 41.3T 6.3T 87%
> /lustre[OST:2]
> lustre-OST0003_UUID 47.6T 42.3T 5.3T 89%
> /lustre[OST:3]
> lustre-OST0004_UUID 47.6T 43.7T 3.9T 92%
> /lustre[OST:4]
> lustre-OST0005_UUID 47.6T 40.1T 7.4T 85%
> /lustre[OST:5]
>
> filesystem_summary: 285.5T 251.5T 34.0T 89% /lustre
>
> [root at n04 ~]#
>
> Is it worth remounting everything and hope crash recovery is working or is
> there some specific checks I can make.
>
>
>
> Sid Young
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230223/700a1779/attachment.htm>
More information about the lustre-discuss
mailing list