[lustre-discuss] Wrong --index set for OST

Dilger, Andreas andreas.dilger at intel.com
Tue Sep 26 10:32:08 PDT 2017


On Sep 25, 2017, at 03:21, rodger <rodger at csag.uct.ac.za> wrote:
> 
> Dear All,
> 
> I'm still struggling with this. I am running an lfsck -A at present.

I think running lfsck is the wrong thing to do in this case.  This is trying to "repair" the filesystem, but the OST indices are mixed up, so it will just be making the problem worse.

One thing you can look at is to dump the "CONFIGS/mountdata" file to see if the correct OST
index is still stored in that file.  Something like:

# debugfs -c -R "dump CONFIGS/mountdata /tmp/mountdata" /dev/<ostdev>
# strings /tmp/mountdata

I don't think the "CONFIGS/<fsname>-OSTnnnn" files will contain the correct OST index anymore
after the tunefs.lustre was run.

> The status update is reporting:
> 
> layout_mdts_init: 0
> layout_mdts_scanning-phase1: 1
> layout_mdts_scanning-phase2: 0
> layout_mdts_completed: 0
> layout_mdts_failed: 0
> layout_mdts_stopped: 0
> layout_mdts_paused: 0
> layout_mdts_crashed: 0
> layout_mdts_partial: 0
> layout_mdts_co-failed: 0
> layout_mdts_co-stopped: 0
> layout_mdts_co-paused: 0
> layout_mdts_unknown: 0
> layout_osts_init: 0
> layout_osts_scanning-phase1: 0
> layout_osts_scanning-phase2: 12
> layout_osts_completed: 0
> layout_osts_failed: 30
> layout_osts_stopped: 0
> layout_osts_paused: 0
> layout_osts_crashed: 0
> layout_osts_partial: 0
> layout_osts_co-failed: 0
> layout_osts_co-stopped: 0
> layout_osts_co-paused: 0
> layout_osts_unknown: 0
> layout_repaired: 82358851
> namespace_mdts_init: 0
> namespace_mdts_scanning-phase1: 1
> namespace_mdts_scanning-phase2: 0
> namespace_mdts_completed: 0
> namespace_mdts_failed: 0
> namespace_mdts_stopped: 0
> namespace_mdts_paused: 0
> namespace_mdts_crashed: 0
> namespace_mdts_partial: 0
> namespace_mdts_co-failed: 0
> namespace_mdts_co-stopped: 0
> namespace_mdts_co-paused: 0
> namespace_mdts_unknown: 0
> namespace_osts_init: 0
> namespace_osts_scanning-phase1: 0
> namespace_osts_scanning-phase2: 0
> namespace_osts_completed: 0
> namespace_osts_failed: 0
> namespace_osts_stopped: 0
> namespace_osts_paused: 0
> namespace_osts_crashed: 0
> namespace_osts_partial: 0
> namespace_osts_co-failed: 0
> namespace_osts_co-stopped: 0
> namespace_osts_co-paused: 0
> namespace_osts_unknown: 0
> namespace_repaired: 68265278
> 
> with the layout_repaired and namespace_repaired values ticking up at about 10000 per second.
> 
> Is the layout_osts_failed value of 30 a concern?
> 
> Is there any way to know how far along it is?
> 
> I am also seeing many messages similar to the following in /var/log/messages on the mdt and oss with OST0000:
> 
> Sep 25 10:48:00 mds0l210 kernel: LustreError: 5934:0:(osp_precreate.c:903:osp_precreate_cleanup_orphans()) terra-OST0000-osc-MDT0000: cannot cleanup orphans: rc = -22
> Sep 25 10:48:00 mds0l210 kernel: LustreError: 5934:0:(osp_precreate.c:903:osp_precreate_cleanup_orphans()) Skipped 599 previous similar messages
> Sep 25 10:48:30 mds0l210 kernel: LustreError: 6137:0:(fld_handler.c:256:fld_server_lookup()) srv-terra-MDT0000: Cannot find sequence 0x8: rc = -2
> Sep 25 10:48:30 mds0l210 kernel: LustreError: 6137:0:(fld_handler.c:256:fld_server_lookup()) Skipped 16593 previous similar messages
> Sep 25 10:58:01 mds0l210 kernel: LustreError: 5934:0:(osp_precreate.c:903:osp_precreate_cleanup_orphans()) terra-OST0000-osc-MDT0000: cannot cleanup orphans: rc = -22
> Sep 25 10:58:01 mds0l210 kernel: LustreError: 5934:0:(osp_precreate.c:903:osp_precreate_cleanup_orphans()) Skipped 599 previous similar messages
> Sep 25 10:58:57 mds0l210 kernel: LustreError: 6137:0:(fld_handler.c:256:fld_server_lookup()) srv-terra-MDT0000: Cannot find sequence 0x8: rc = -2
> Sep 25 10:58:57 mds0l210 kernel: LustreError: 6137:0:(fld_handler.c:256:fld_server_lookup()) Skipped 40309 previous similar messages
> 
> Do these indicate that the process is not working?
> 
> Regards,
> Rodger
> 
> 
> 
> On 23/09/2017 15:07, rodger wrote:
>> Dear All,
>> In the process of upgrading 1.8.x to 2.x I've messed up a number of the index values for OSTs by running tune2fs with the --index value set. To compound matters while trying to get the OSTs to mount I erased the last_rcvd files on the OSTs. I'm looking for a way to confirm what the index should be for each device. Part of the reason for my difficulty is that in the evolution of the filesystem some OSTs were decommissioned and so the full set no longer has a sequential set of index values. In practicing for the upgrade the trial sets that I created did have nice neat sequential indexes and the process I developed broke when I used the real data. :-(
>> The result is that although the lustre filesystem mounts and all directories appear to be listed files in directories mostly have question marks for attributes and are not available for access. I'm assuming this is because the index for the OST holding the file is wrong.
>> Any pointers to recovery would be much appreciated!
>> Regards,
>> Rodger
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









More information about the lustre-discuss mailing list