<div dir="ltr">Hi Rick,<div><br></div><div>  I did attempt that, and while subsequent access didn't cause an MDS panic, the client threw errors like "cannot get group lock: Invalid argument (22)".</div><div><br></div><div>  I'm going to attempt the patch and workaround from LU-16194 suggested by Andreas a couple hours ago on the LU-16152 bug report.</div><div><br></div><div>  My guess is that normal people set the PFL components directly as arguments to lfs setstripe, or reference an existing file's PFL with --copy. Both of those methods work fine, but I took the fancy yaml route.</div><div><br></div><div>Thanks,</div><div>Nate</div><div><div><br></div><div>  </div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Oct 25, 2022 at 2:51 PM Mohr, Rick <<a href="mailto:mohrrf@ornl.gov">mohrrf@ornl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Nate,<br>

<br>

For the example layout you attached, it looks like the file does not have any data in the components with the messed up extent_end value.  Have you tried using "lfs setstripe --component-del" to delete just those messed up components and see if you can then access the data?<br>

<br>

--Rick<br>

<br>

<br>

On 10/25/22, 4:43 PM, "lustre-discuss on behalf of Nathan Crawford" <<a href="mailto:lustre-discuss-bounces@lists.lustre.org" target="_blank">lustre-discuss-bounces@lists.lustre.org</a> on behalf of <a href="mailto:nrcrawfo@uci.edu" target="_blank">nrcrawfo@uci.edu</a>> wrote:<br>

<br>

    Hi All,<br>

      I'm looking for possible work-arounds to recover data from some mis-migrated files (as seen in  LU-16152). Basically, there's a bug in "lfs setstripe --yaml" where extent start/end values in the yaml file >= 2GiB overflow to 16 EiB - 2 GiB.<br>

<br>

      Using lfs_migrate, I re-striped many files in directories with a default striping pattern containing these values.  I'm pretty sure that the data exists (was trying to purge an older OST, and disk usage on the other OSTs increased as the purged OST decreased), and an lfsck procedure happily returns after a day or so. Unfortunately, attempts to access or re-migrate the files triggers a kernel panic on the MDS with:<br>

<br>

    LustreError: 12576:0:(osd_io.c:311:kmem_to_page()) ASSERTION( !((unsigned long)addr & ~(~(((1UL) << 12)-1))) ) failed:<br>

    LustreError: 12576:0:(osd_io.c:311:kmem_to_page()) LBUG<br>

<br>

    Kernel panic - not syncing: LBUG<br>

<br>

<br>

     The servers are lustre 2.12.8 on OpenZFS 0.8.5 on CentOS 7.9. The output from "lfs getstripe -v badfile" is attached.<br>

<br>

      I can use lfs find to search for files with these bad extent endpoint values, then move them to a quarantine area on the same FS. This will allow the rest of the system to stay up (hopefully) but recovering the data is still needed.<br>

<br>

    Thanks!<br>

    Nate<br>

<br>

    -- <br>

    Dr. Nathan Crawford              <a href="mailto:nathan.crawford@uci.edu" target="_blank">nathan.crawford@uci.edu</a><br>

    Director of Scientific Computing<br>

    School of Physical Sciences<br>

    164 Rowland Hall                 Office: 152 Rowland Hall<br>

    University of California, Irvine  Phone: 949-824-1380<br>

    Irvine, CA 92697-2025, USA<br>

<br>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div dir="ltr"><div dir="ltr"><pre>Dr. Nathan Crawford              <a href="mailto:nathan.crawford@uci.edu" target="_blank">nathan.crawford@uci.edu</a>

Director of Scientific Computing

School of Physical Sciences

164 Rowland Hall                 Office: 152 Rowland Hall

University of California, Irvine  Phone: 949-824-1380

Irvine, CA 92697-2025, USA</pre></div></div></div></div>