[lustre-discuss] "ls" hangs for certain files AND lfsck_namespace gets stuck in scanning-phase1, same position
Sternberg, Michael G.
sternberg at anl.gov
Sat Jun 15 23:18:39 PDT 2019
Fan,
Thank you for the pointers. I attempted to cat(1) an affected file with debugging on and inspected it by "lctl debug_file …". Alas, I didn't have enough experience to know what to look for.
But I noticed, upon mounting both MDTs as ldiskfs, a bunch of files with suspicious *all-zero permission bits and owner:group*, e.g. in a user's home dir, the one where I mentioned that "ls -a" triggered the hang but "ls" alone did not. This had hinted at dot-files being affected.
Solution: I removed those zero-permission files. This allowed LFSCK to complete upon regular mounting, and the file system has been fully operable.
For illustration, the affected home directory looked like this one:
/mnt/cmdt1/REMOTE_PARENT_DIR/0x240000418:0x7c2:0x0/[0x240000400:0x8:0x0]:0:
total 20
drwx------ 5 userfoo userfoo 4096 Apr 24 14:58 .
drwx------ 4 userfoo userfoo 4096 Dec 19 16:08 ..
-rw------- 1 userfoo userfoo 0 Sep 4 2018 .Xauthority
---------- 1 root root 0 Jan 13 20:03 .bash_history
-rw-r--r-- 1 userfoo userfoo 0 May 15 2018 .bash_logout
---------- 1 root root 0 Jan 13 20:03 .bashrc
---------- 1 root root 0 Jan 13 20:03 .bashrc-anaconda2.bak
drwxrwxr-x 3 userfoo userfoo 4096 May 16 2018 .conda
-rw------- 1 userfoo userfoo 0 May 17 2018 .python_history
I suspected at first that only DNE dirs would be affected, but a wider search turned up matches under ROOT/ as well, less than 200 scattered files and some 15,000 in a single machine-created/re-creatable dir.
find /mnt/*mdt*/{ROOT,REMOTE_PARENT_DIR} -type f -perm 0 > ~/lustre-perm-0-files-only.lst 2>&1 &
I figured I could pretend I had restored the MDS from a backing-fs file-level backup, allowing me to follow the Manual sec. 18.4 "Restoring a File-Level Backup". I isolated and removed all files, and reset LFSCK, like so:
gtar --files-from=lustre-perm-0-files-only.lst --xattrs -zcvf ~/lustre-perm-0-files-only.tar.gz
xargs rm -v < lustre-perm-0-files-only.lst 2>&1 | tee lustre-perm-0-files-only-rm.log
rm -rv /mnt/cmdt?/{lfsck_*,LFSCK}
(I verified the file list has no spaces and shell metachars, so was safe to use.)
With best regards,
Michael
> On Jun 15, 2019, at 0:42 , Yong, Fan <fan.yong at intel.com> wrote:
>
> Hi Michael,
>
> How do you trigger the namespace LFSCK? "-t namespace" or "-t all" (by default)? The latter one will start both namespace LFSCK and layout LFSCK. If layout LFSCK is also triggered, then layout LFSCK hung will also block the namespace LFSCK engine.
>
> Generally, LFSCK depends on the backend consistency. Namespace LFSCK is used to verify Lustre own namespace logic consistency. If there is backend on-disk data corruption, then the namespace LFSCK's behavior may be undefined.
>
> On the other hand, "ls" hung is not always means MDT trouble, it also may be related with some OST side trouble since Lustre client will glimpse file size from OST(s) if without SoM (Size on MDT). If the hung trouble always happened when accessing regular files, but works for sub-directory objects, then it is quite possible that some OST (objects) have trouble. If both regular files and sub-directory objects have similar hung trouble, then something is wrong on the MDT side.
>
> To make things clearly, you can temporarily enable Lustre full debug log on some relative idle client, then "stat" one of the trouble file from such client, collect and analysis Lustre kernel debug log on such client, that will tell you which server has trouble, and then further debug on such server.
>
>
> --
> Cheers,
> Nasf
>
> -----Original Message-----
> From: lustre-discuss [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Sternberg, Michael G.
> Sent: Saturday, June 15, 2019 8:45 AM
> To: lustre-discuss at lists.lustre.org
> Subject: [lustre-discuss] "ls" hangs for certain files AND lfsck_namespace gets stuck in scanning-phase1, same position
>
> Hello,
>
> On a Lustre-2.12.2 system, running "ls" on a client hangs for certain files.
>
> This has the increasingly troublesome global ramification of preventing backups from completing (unless I could manage to establish an exclusion list, by trial-and-error increments). The "ls" process cannot be killed; there is no log entry on the client and servers; neither clients as a whole nor servers hang.
>
> - More precisely, "ls -f" does *not* hang; "ls -l" and other options calling for metadata do trigger the hang.
>
> - I can do "ls -l", "cat" and "lfs getstripe" on one file from the problematic dir, but not the single other file in that dir.
>
> - I another dir, "ls" works, but "ls -a" doesn't, hinting that the issue is related to *files*, not so much the directories themselves.
>
>
> I'll describe my diag steps so far in the following. -- What could I do next?
>
>
> * This looked at first similar to LU-8696 {https://jira.whamcloud.com/browse/LU-8696 "ls" hangs on a particular directory on production system}, and I tried the approach suggested there,
>
> mds# lctl set_param fail_loc=0x1505
>
> but that did not change the "ls" behavior, even with the client rebooted.
>
>
> * My file system has 2 MDS (forming 1 HA pair) and 6 OSS (forming 3 HA pairs).
>
> - For diagnostics, I run with HA inactive and instead all OSD mounted manually, on one node per HA pair, respectively.
>
> - "lfs df" outputs are below, for both blocks and inodes.
>
> - All OSDs are connected over multipath. (*)
>
> - 2 OSTs are "large", at 31 TB (28.9 TiB). (**)
>
> (*) and (**): It appears I hit "LU-10510 blk_cloned_rq_check_limits: over max size limit.", which went away after I issued the following for each "large" OSD (> 16 TB):
>
> echo 16384 > /sys/block/$i/queue/max_sectors_kb
>
> I did that manually because "l_tunedisk /dev/foo" from /etc/udev/rules.d/99-lustre-server.rules did not appear to have an effect.
>
>
> * The FS was formatted under 2.10.7. I hoped from LU-8696/LU-10237 that upgrading the servers, a test client, and thus "lfsck" to 2.12 would help, but it evidently didn't.
>
> * I ran the following diagnostics so far:
>
> (1) e2fsck (from e2fsprogs-1.44.5.wc1-0.el7.x86_64) on all OSD, which found and fixed some errors of the following (trivial?) kind, about a dozen per OSD:
>
> [QUOTA WARNING] Usage inconsistent for ID 22xxxx:actual (68884811776, 21451) != expected (68884819968, 21451)
> Inode 10119785 extent tree (at level 1) could be shorter. Optimize? yes
> Inode 10121783 extent tree (at level 1) could be narrower. Optimize? yes
>
> (2) lfsck. This, to confound matters, gets stuck as well:
>
> While in scanning-phase1, after 4 and 9 minutes, respectively, "last_checkpoint_time" and "current_position" no longer advance for both MDTs of the file system. They get stuck:
>
> for MDT0000 at:
> current_position: 20457226, [0x240000418:0xcf75:0x0], 0x9dc22912f913dbf
>
> for MDT0001 at:
> current_position: 27846673, [0x200000419:0x157df:0x0], 0x7e6aec6f046e890d
>
> I append the full "lctl get_param" outputs.
>
>
> With best regards,
> --
> Michael Sternberg, Ph.D.
> Principal Scientific Computing Administrator Center for Nanoscale Materials Argonne National Laboratory
>
>
> Details:
>
> client# lfs df /home
> UUID 1K-blocks Used Available Use% Mounted on
> carbonfs-MDT0000_UUID 28426640 4888312 20423852 20% /home[MDT:0]
> carbonfs-MDT0001_UUID 28426640 4476696 20835468 18% /home[MDT:1]
> carbonfs-OST0000_UUID 15498310640 5746106888 8970807784 40% /home[OST:0]
> carbonfs-OST0001_UUID 15498310640 6418661584 8298253088 44% /home[OST:1]
> carbonfs-OST0002_UUID 15498310640 6094074632 8622840040 42% /home[OST:2]
> carbonfs-OST0003_UUID 15498310640 6561987336 8154927336 45% /home[OST:3]
> carbonfs-OST0004_UUID 15498310640 6708444352 8008470320 46% /home[OST:4]
> carbonfs-OST0005_UUID 15498310640 6346698352 8370216320 44% /home[OST:5]
> carbonfs-OST0006_UUID 15498310640 6407076140 8309838532 44% /home[OST:6]
> carbonfs-OST0007_UUID 15498310640 6085008472 8631906200 42% /home[OST:7]
> carbonfs-OST0008_UUID 30996858600 11440083316 17993977716 39% /home[OST:8]
> carbonfs-OST0009_UUID 30996858600 12218084364 17215976668 42% /home[OST:9]
>
> filesystem_summary: 185980202320 74026225436 102577214004 42% /home
>
> client# lfs df -i /home
> UUID Inodes IUsed IFree IUse% Mounted on
> carbonfs-MDT0000_UUID 31154760 12356024 18798736 40% /home[MDT:0]
> carbonfs-MDT0001_UUID 31154760 10709228 20445532 35% /home[MDT:1]
> carbonfs-OST0000_UUID 12399920 2502317 9897603 21% /home[OST:0]
> carbonfs-OST0001_UUID 12399920 2437151 9962769 20% /home[OST:1]
> carbonfs-OST0002_UUID 12399920 2479428 9920492 20% /home[OST:2]
> carbonfs-OST0003_UUID 12399920 2424332 9975588 20% /home[OST:3]
> carbonfs-OST0004_UUID 12399920 2405313 9994607 20% /home[OST:4]
> carbonfs-OST0005_UUID 12399920 2461958 9937962 20% /home[OST:5]
> carbonfs-OST0006_UUID 12399920 2422940 9976980 20% /home[OST:6]
> carbonfs-OST0007_UUID 12399920 2471340 9928580 20% /home[OST:7]
> carbonfs-OST0008_UUID 24800048 4637459 20162589 19% /home[OST:8]
> carbonfs-OST0009_UUID 24800048 4574730 20225318 19% /home[OST:9]
>
> filesystem_summary: 62309520 23065252 39244268 38% /home
>
>
>
> mds# lctl get_param -n mdd.carbonfs-MDT0000.lfsck_namespace
> name: lfsck_namespace
> magic: 0xa06249ff
> version: 2
> status: scanning-phase1
> flags:
> param: all_targets,create_ostobj,create_mdtobj
> last_completed_time: N/A
> time_since_last_completed: N/A
> latest_start_time: 1560551417
> time_since_latest_start: 2926 seconds
> last_checkpoint_time: 1560551957
> time_since_last_checkpoint: 2386 seconds
> latest_start_position: 77, N/A, N/A
> last_checkpoint_position: 27483017, N/A, N/A
> first_failure_position: N/A, N/A, N/A
> checked_phase1: 12194065
> checked_phase2: 0
> updated_phase1: 0
> updated_phase2: 0
> failed_phase1: 0
> failed_phase2: 0
> directories: 782291
> dirent_repaired: 0
> linkea_repaired: 0
> nlinks_repaired: 0
> multiple_linked_checked: 1486515
> multiple_linked_repaired: 0
> unknown_inconsistency: 0
> unmatched_pairs_repaired: 0
> dangling_repaired: 0
> multiple_referenced_repaired: 0
> bad_file_type_repaired: 0
> lost_dirent_repaired: 0
> local_lost_found_scanned: 0
> local_lost_found_moved: 0
> local_lost_found_skipped: 0
> local_lost_found_failed: 0
> striped_dirs_scanned: 0
> striped_dirs_repaired: 0
> striped_dirs_failed: 0
> striped_dirs_disabled: 0
> striped_dirs_skipped: 0
> striped_shards_scanned: 0
> striped_shards_repaired: 0
> striped_shards_failed: 0
> striped_shards_skipped: 0
> name_hash_repaired: 0
> linkea_overflow_cleared: 0
> agent_entries_repaired: 0
> success_count: 0
> run_time_phase1: 2925 seconds
> run_time_phase2: 0 seconds
> average_speed_phase1: 4168 items/sec
> average_speed_phase2: N/A
> average_speed_total: 4168 items/sec
> real_time_speed_phase1: 46 items/sec
> real_time_speed_phase2: N/A
> current_position: 27846673, [0x200000419:0x157df:0x0], 0x7e6aec6f046e890d
>
> # lctl get_param -n mdd.carbonfs-MDT0001.lfsck_namespace
> name: lfsck_namespace
> magic: 0xa06249ff
> version: 2
> status: scanning-phase1
> flags:
> param: all_targets,create_ostobj,create_mdtobj
> last_completed_time: N/A
> time_since_last_completed: N/A
> latest_start_time: 1560551417
> time_since_latest_start: 3947 seconds
> last_checkpoint_time: 1560551661
> time_since_last_checkpoint: 3703 seconds
> latest_start_position: 77, N/A, N/A
> last_checkpoint_position: 19924068, [0x240000418:0xbff:0x0], 0xac6ef906919f08
> first_failure_position: N/A, N/A, N/A
> checked_phase1: 7205570
> checked_phase2: 0
> updated_phase1: 0
> updated_phase2: 0
> failed_phase1: 0
> failed_phase2: 0
> directories: 508466
> dirent_repaired: 0
> linkea_repaired: 0
> nlinks_repaired: 0
> multiple_linked_checked: 302727
> multiple_linked_repaired: 0
> unknown_inconsistency: 0
> unmatched_pairs_repaired: 0
> dangling_repaired: 0
> multiple_referenced_repaired: 0
> bad_file_type_repaired: 0
> lost_dirent_repaired: 0
> local_lost_found_scanned: 0
> local_lost_found_moved: 0
> local_lost_found_skipped: 0
> local_lost_found_failed: 0
> striped_dirs_scanned: 0
> striped_dirs_repaired: 0
> striped_dirs_failed: 0
> striped_dirs_disabled: 0
> striped_dirs_skipped: 0
> striped_shards_scanned: 0
> striped_shards_repaired: 0
> striped_shards_failed: 0
> striped_shards_skipped: 0
> name_hash_repaired: 0
> linkea_overflow_cleared: 0
> agent_entries_repaired: 0
> success_count: 0
> run_time_phase1: 3946 seconds
> run_time_phase2: 0 seconds
> average_speed_phase1: 1826 items/sec
> average_speed_phase2: N/A
> average_speed_total: 1826 items/sec
> real_time_speed_phase1: 45 items/sec
> real_time_speed_phase2: N/A
> current_position: 20457226, [0x240000418:0xcf75:0x0], 0x9dc22912f913dbf
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
More information about the lustre-discuss
mailing list