[lustre-discuss] lustre-discuss Digest, Vol 230, Question about PFL

Mon May 12 04:56:05 PDT 2025

Sergey, 
I would suggest that a file in a directory that has striping requesting pool1 but has osts in pool2 is due to a user explicitly overriding the directory default striping and specifically requesting osts in pool2 with lfs setstripe. I do it all the time. 
John
Sent from my iPhone

> On May 12, 2025, at 5:29 AM, lustre-discuss-request at lists.lustre.org wrote:
> 
> Send lustre-discuss mailing list submissions to
>    lustre-discuss at lists.lustre.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>    http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> or, via email, send a message with subject or body 'help' to
>    lustre-discuss-request at lists.lustre.org
> 
> You can reach the person managing the list at
>    lustre-discuss-owner at lists.lustre.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lustre-discuss digest..."
> 
> 
> Today's Topics:
> 
>   1. an OST is dead (Noskov, Dr. Sergey)
>   2. Question about PFL (Noskov, Dr. Sergey)
>   3. Re: Kernel panic when reading Lustre osc stats on GPU nodes
>      (Anna Fuchs)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Mon, 12 May 2025 08:37:26 +0000
> From: "Noskov, Dr. Sergey" <noskov at uni-mainz.de>
> To: "lustre-discuss at lists.lustre.org"
>    <lustre-discuss at lists.lustre.org>
> Subject: [lustre-discuss] an OST is dead
> Message-ID: <ACA21C3A-D4F2-44B4-9A56-8A78C0442991 at uni-mainz.de>
> Content-Type: text/plain; charset="utf-8"
> 
> Hi everyone,
> 
> we have a problem and I would be very grateful for help or advice.
> 
> We are using ZFS for metadata and object storage target in our Lustre. One of the pools is a draid3:12d:42c:2s-0 is became no longer importable after one of the disks failed.
> 
> on startup zpool import:
> 
> pool: l1fs-OST010b
>     id: 6548008278833985886
>  state: FAULTED
> status: One or more devices contains corrupted data.
> action: The pool cannot be imported due to damaged devices or data.
>   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
>   ...
> 
> the commands
> 
> zpool import -f l1fs-OST010b
> zpool import -F l1fs-OST010b
> zpool import -F -X l1fs-OST010b
> 
> yield:
> cannot import 'l1fs-OST010b': I/O error
>                Destroy and re-create the pool from
>                a backup source.
> 
> zdb -e ends very quickly with the I/O error
> Some tool said there is missing metadata in the zpool.
> 
> Does anyone have any productive advice for us other than what is on the website
> https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
> ?
> 
> Now the question: how do we proceed if we have to reformat the OST?
> At the moment we see the plan as:
> Disable the broken OST (or is it possible to use the same name for OST?)
> create a new pool and integrate it into the Lustre
> 
> What do we do with the files on the broken OST, can we just delete them now? Can the difference in metadata and object targets be repaired with Lustre-FSCK or somehow?
> 
> Thanks in advance
> 
> With best regards
> 
> Sergey Noskov
> 
> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20250512/329bf628/attachment-0001.htm>
> 
> ------------------------------
> 
> Message: 2
> Date: Mon, 12 May 2025 10:17:38 +0000
> From: "Noskov, Dr. Sergey" <noskov at uni-mainz.de>
> To: "lustre-discuss at lists.lustre.org"
>    <lustre-discuss at lists.lustre.org>
> Subject: [lustre-discuss] Question about PFL
> Message-ID: <0392F111-1B96-424A-9932-03B2E9BDFFDD at contoso.com>
> Content-Type: text/plain; charset="utf-8"
> 
> Hello everyone,
> 
> we have two PFL pools in the lustre.
> 
> We notice that there are files that are in the directory of pool1, but for some reason they belong to pool2.
> 
> Would that be possible that PFL does not always work?
> 
> With best regards
> 
> Sergey Noskov
> 
> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20250512/95162160/attachment-0001.htm>
> 
> ------------------------------
> 
> Message: 3
> Date: Mon, 12 May 2025 12:27:43 +0200
> From: Anna Fuchs <anna.fuchs at uni-hamburg.de>
> To: Oleg Drokin <green at whamcloud.com>,
>    "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
> Subject: Re: [lustre-discuss] Kernel panic when reading Lustre osc
>    stats on GPU nodes
> Message-ID: <1c9a20a4-2f07-3be5-2402-c2d57d397eb8 at uni-hamburg.de>
> Content-Type: text/plain; charset="UTF-8"; format=flowed
> 
> Hi,
> thanks for the suggestions. The GPUs are used frequently and already
> have heavier system-call-intensive CPU load. However, since the issue
> occurs across different machines, we don't suspect a structural hardware
> defect. It's more likely a bug in the NVIDIA stack, possibly triggered
> by concurrent sys/debug usage.
> 
> At times, we even had the query interval down to one second, but the
> issue doesn't reproduce reliably, not over several hours or a few days.
> 
> It seems like no one else has encountered this so far.
> We'll keep investigating ? thanks again!
> 
> Best
> Anna
> 
> 
>> On 5/7/25 18:15, Oleg Drokin wrote:
>> Hello!
>> 
>> "An uncorrectable ECC error detected" does sound like there's some
>> hardware problem, while it is strange you only get this on GPU nodes
>> (Extra power load leading to higher chances of memory corruption + more
>> frequent kernel memory scannong increasing the chance to hit such
>> curruption?) I'd expect you'd be seeing other crashes on such GPU nodes
>> .
>> 
>> Can you just generate some other cpu load (that involves system calls)
>> on those nodes perhaps and see if suddenly crashes go up as well, just
>> in some other area?
>> 
>>> On Wed, 2025-05-07 at 17:23 +0200, Anna Fuchs via lustre-discuss wrote:
>>> 
>>> Dear all,
>>> 
>>> We're facing an issue that is hopefully not directly related to
>>> Lustre itself (we're not using community Lustre), but maybe someone
>>> here has seen something similar or knows someone who has.
>>> 
>>> On our GPU partition with A100-SXM4-80GB GPUs (VBIOS version:
>>> 92.00.36.00.02), we?re trying to read IOPS statistics (osc_stats) via
>>> the files under /sys/kernel/debug/lustre/osc/ (we?re running 160
>>> OSTs, Lustre version 2.14.0_ddn184). Our goal is to sample the data
>>> at 5-second intervals, then aggregate and postprocess it into
>>> readable metrics.
>>> 
>>> We have a collectd daemon running, which had been stable for a long
>>> time. After integrating the IOPS metric, however, we occasionally hit
>>> a kernel panic (see crash dump excerpts below). The issue appears to
>>> originate somewhere in the GPU firmware stack, but we're unsure why
>>> this happens and how it's related to reading Lustre metrics.
>>> 
>>> The problem occurs often, but is hard to reproduce and happens at
>>> random. We?re hesitant to run the scripts frequently since a crash
>>> could interrupt critical GPU workloads. That said, limited test runs
>>> over several hours often work fine, especially after a fresh reboot.
>>> The CPU-only nodes run the same scripts without issues all the time.
>>> 
>>> 
>>> Could this be a sign that /sys/kernel/debug is being overwhelmed
>>> somehow? Although that shouldn?t normally cause a kernel panic.
>>> 
>>> We?d appreciate any insights, experiences, or pointers, even indirect
>>> ones.
>>> 
>>> Thanks in advance!
>>> 
>>> Anna
>>> 
>>> 
>>> 
>>> 
>>> 2024-12-17 17:11:28 [2453606.802826] NVRM: Xid (PCI:0000:03:00): 120,
>>> pid='<unknown>', name=<unknown>, GSP task timeout @ pc:0x4bd36c4,
>>> task:
>>> 1
>>> 2024-12-17 17:11:28 [2453606.802835] NVRM:     Reported by libos
>>> task:0 v2.0 [0] @ ts:1734451888
>>> 2024-12-17 17:11:28 [2453606.802837] NVRM:     RISC-V CSR State:
>>> 2024-12-17 17:11:28 [2453606.802840] NVRM:
>>> mstatus:0x000000001e000000  mscratch:0x0000000000000000
>>> mie:0x0000000000000880  mip:0x
>>> 0000000000000000
>>> 2024-12-17 17:11:28 [2453606.802842] NVRM:
>>> mepc:0x0000000004bd36c4  mbadaddr:0x00000100badca700
>>> mcause:0x8000000000000007
>>> 2024-12-17 17:11:28 [2453606.802844] NVRM:     RISC-V GPR State:
>>> [...]
>>> 2024-12-17 17:11:29 [2453606.803121] NVRM: Xid (PCI:0000:03:00): 140,
>>> pid='<unknown>', name=<unknown>, An uncorrectable ECC error detected
>>> (p
>>> ossible firmware handling failure) DRAM:-1840691462, LTC:0, MMU:0,
>>> PCIE:0
>>> [...]
>>> 2024-12-17 17:30:03 [2454721.362906] Kernel panic - not syncing:
>>> Fatal exception
>>> 2024-12-17 17:30:03 [2454721.611822] Kernel Offset: 0x5200000 from
>>> 0xffffffff81000000 (relocation range: 0xffffffff80000000-
>>> 0xffffffffbfffffff)
>>> 2024-12-17 17:30:03 [2454721.770927] ---[ end Kernel panic - not
>>> syncing: Fatal exception ]---
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
> 
> ------------------------------
> 
> Subject: Digest Footer
> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
> 
> ------------------------------
> 
> End of lustre-discuss Digest, Vol 230, Issue 4
> **********************************************