[lustre-discuss] Accessing files with bad PFL causing MDS kernel panics

Nathan Crawford nrcrawfo at uci.edu
Tue Oct 25 13:41:08 PDT 2022


Hi All,

  I'm looking for possible work-arounds to recover data from some
mis-migrated files (as seen in  LU-16152). Basically, there's a bug in "lfs
setstripe --yaml" where extent start/end values in the yaml file >= 2GiB
overflow to 16 EiB - 2 GiB.

  Using lfs_migrate, I re-striped many files in directories with a default
striping pattern containing these values.  I'm pretty sure that the data
exists (was trying to purge an older OST, and disk usage on the other OSTs
increased as the purged OST decreased), and an lfsck procedure happily
returns after a day or so. Unfortunately, attempts to access or re-migrate
the files triggers a kernel panic on the MDS with:

LustreError: 12576:0:(osd_io.c:311:kmem_to_page()) ASSERTION( !((unsigned
long)addr & ~(~(((1UL) << 12)-1))) ) failed:
LustreError: 12576:0:(osd_io.c:311:kmem_to_page()) LBUG
Kernel panic - not syncing: LBUG

 The servers are lustre 2.12.8 on OpenZFS 0.8.5 on CentOS 7.9. The output
from "lfs getstripe -v badfile" is attached.

  I can use lfs find to search for files with these bad extent endpoint
values, then move them to a quarantine area on the same FS. This will allow
the rest of the system to stay up (hopefully) but recovering the data is
still needed.

Thanks!
Nate

-- 

Dr. Nathan Crawford              nathan.crawford at uci.edu
Director of Scientific Computing
School of Physical Sciences
164 Rowland Hall                 Office: 152 Rowland Hall
University of California, Irvine  Phone: 949-824-1380
Irvine, CA 92697-2025, USA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20221025/fc43130d/attachment.htm>
-------------- next part --------------
composite_header:
  lcm_magic:         0x0BD60BD0
  lcm_size:          552
  lcm_flags:         0
  lcm_layout_gen:    2
  lcm_mirror_count:  1
  lcm_entry_count:   5
components:
  - lcme_id:             131073
    lcme_mirror_id:      2
    lcme_flags:          init
    lcme_extent.e_start: 0
    lcme_extent.e_end:   131072
    lcme_offset:         272
    lcme_size:           32
    sub_layout:
      lmm_magic:         0x0BD10BD0
      lmm_seq:           0x20000b836
      lmm_object_id:     0x19a
      lmm_fid:           [0x20000b836:0x19a:0x0]
      lmm_stripe_count:  0
      lmm_stripe_size:   131072
      lmm_pattern:       mdt
      lmm_layout_gen:    0
      lmm_stripe_offset: 0

  - lcme_id:             131074
    lcme_mirror_id:      2
    lcme_flags:          init
    lcme_extent.e_start: 131072
    lcme_extent.e_end:   16777216
    lcme_offset:         304
    lcme_size:           56
    sub_layout:
      lmm_magic:         0x0BD10BD0
      lmm_seq:           0x20000b836
      lmm_object_id:     0x19a
      lmm_fid:           [0x20000b836:0x19a:0x0]
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 6
      lmm_objects:
      - 0: { l_ost_idx: 6, l_fid: [0x100060000:0x2de85:0x0] }

  - lcme_id:             131075
    lcme_mirror_id:      2
    lcme_flags:          init
    lcme_extent.e_start: 16777216
    lcme_extent.e_end:   1073741824
    lcme_offset:         360
    lcme_size:           80
    sub_layout:
      lmm_magic:         0x0BD10BD0
      lmm_seq:           0x20000b836
      lmm_object_id:     0x19a
      lmm_fid:           [0x20000b836:0x19a:0x0]
      lmm_stripe_count:  2
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 2
      lmm_objects:
      - 0: { l_ost_idx: 2, l_fid: [0x100020000:0x861394f:0x0] }
      - 1: { l_ost_idx: 5, l_fid: [0x100050000:0x12e4858e:0x0] }

  - lcme_id:             131076
    lcme_mirror_id:      2
    lcme_flags:          0
    lcme_extent.e_start: 1073741824
    lcme_extent.e_end:   18446744071562067968
    lcme_offset:         440
    lcme_size:           56
    sub_layout:
      lmm_magic:         0x0BD10BD0
      lmm_seq:           0x20000b836
      lmm_object_id:     0x19a
      lmm_fid:           [0x20000b836:0x19a:0x0]
      lmm_stripe_count:  4
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: -1

  - lcme_id:             131077
    lcme_mirror_id:      2
    lcme_flags:          0
    lcme_extent.e_start: 18446744071562067968
    lcme_extent.e_end:   EOF
    lcme_offset:         496
    lcme_size:           56
    sub_layout:
      lmm_magic:         0x0BD10BD0
      lmm_seq:           0x20000b836
      lmm_object_id:     0x19a
      lmm_fid:           [0x20000b836:0x19a:0x0]
      lmm_stripe_count:  8
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: -1



More information about the lustre-discuss mailing list