[lustre-discuss] "ls" hangs for certain files AND lfsck_namespace gets stuck in scanning-phase1, same position

Sat Jun 15 23:18:39 PDT 2019

Fan,

Thank you for the pointers. I attempted to cat(1) an affected file with debugging on and inspected it by "lctl debug_file …". Alas, I didn't have enough experience to know what to look for.

But I noticed, upon mounting both MDTs as ldiskfs, a bunch of files with suspicious *all-zero permission bits and owner:group*, e.g. in a user's home dir, the one where I mentioned that "ls -a" triggered the hang but "ls" alone did not. This had hinted at dot-files being affected.

Solution: I removed those zero-permission files. This allowed LFSCK to complete upon regular mounting, and the file system has been fully operable.

For illustration, the affected home directory looked like this one:

/mnt/cmdt1/REMOTE_PARENT_DIR/0x240000418:0x7c2:0x0/[0x240000400:0x8:0x0]:0:
total 20
drwx------  5 userfoo userfoo      4096 Apr 24 14:58 .
drwx------  4 userfoo userfoo      4096 Dec 19 16:08 ..
-rw-------  1 userfoo userfoo         0 Sep  4  2018 .Xauthority
----------  1 root    root            0 Jan 13 20:03 .bash_history
-rw-r--r--  1 userfoo userfoo         0 May 15  2018 .bash_logout
----------  1 root    root            0 Jan 13 20:03 .bashrc
----------  1 root    root            0 Jan 13 20:03 .bashrc-anaconda2.bak
drwxrwxr-x  3 userfoo userfoo      4096 May 16  2018 .conda
-rw-------  1 userfoo userfoo         0 May 17  2018 .python_history

I suspected at first that only DNE dirs would be affected, but a wider search turned up matches under ROOT/ as well, less than 200 scattered files and some 15,000 in a single machine-created/re-creatable dir.

	find /mnt/*mdt*/{ROOT,REMOTE_PARENT_DIR} -type f -perm 0 > ~/lustre-perm-0-files-only.lst 2>&1 &

I figured I could pretend I had restored the MDS from a backing-fs file-level backup, allowing me to follow the Manual sec. 18.4 "Restoring a File-Level Backup". I isolated and removed all files, and reset LFSCK, like so:

	gtar --files-from=lustre-perm-0-files-only.lst --xattrs -zcvf ~/lustre-perm-0-files-only.tar.gz
	xargs rm -v < lustre-perm-0-files-only.lst  2>&1 | tee lustre-perm-0-files-only-rm.log
	rm -rv /mnt/cmdt?/{lfsck_*,LFSCK}

(I verified the file list has no spaces and shell metachars, so was safe to use.)

With best regards,
Michael

> On Jun 15, 2019, at 0:42 , Yong, Fan <fan.yong at intel.com> wrote:
> 
> Hi Michael,
> 
> How do you trigger the namespace LFSCK? "-t namespace" or "-t all" (by default)? The latter one will start both namespace LFSCK and layout LFSCK. If layout LFSCK is also triggered, then layout LFSCK hung will also block the namespace LFSCK engine.
> 
> Generally, LFSCK depends on the backend consistency. Namespace LFSCK is used to verify Lustre own namespace logic consistency. If there is backend on-disk data corruption, then the namespace LFSCK's behavior may be undefined.
> 
> On the other hand, "ls" hung is not always means MDT trouble, it also may be related with some OST side trouble since Lustre client will glimpse file size from OST(s) if without SoM (Size on MDT). If the hung trouble always happened when accessing regular files, but works for sub-directory objects, then it is quite possible that some OST (objects) have trouble. If both regular files and sub-directory objects have similar hung trouble, then something is wrong on the MDT side.
> 
> To make things clearly, you can temporarily enable Lustre full debug log on some relative idle client, then "stat" one of the trouble file from such client, collect and analysis Lustre kernel debug log on such client, that will tell you which server has trouble, and then further debug on such server.
> 
> 
> --
> Cheers,
> Nasf
> 
> -----Original Message-----
> From: lustre-discuss [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Sternberg, Michael G.
> Sent: Saturday, June 15, 2019 8:45 AM
> To: lustre-discuss at lists.lustre.org
> Subject: [lustre-discuss] "ls" hangs for certain files AND lfsck_namespace gets stuck in scanning-phase1, same position
> 
> Hello,
> 	
> On a Lustre-2.12.2 system, running "ls" on a client hangs for certain files.
> 
> This has the increasingly troublesome global ramification of preventing backups from completing (unless I could manage to establish an exclusion list, by trial-and-error increments). The "ls" process cannot be killed; there is no log entry on the client and servers; neither clients as a whole nor servers hang.
> 
> - More precisely, "ls -f" does *not* hang; "ls -l" and other options calling for metadata do trigger the hang.
> 
> - I can do "ls -l", "cat" and "lfs getstripe" on one file from the problematic dir, but not the single other file in that dir.
> 
> - I another dir, "ls" works, but "ls -a" doesn't, hinting that the issue is related to *files*, not so much the directories themselves.
> 
> 
> I'll describe my diag steps so far in the following. -- What could I do next?
> 
> 
> * This looked at first similar to LU-8696  {https://jira.whamcloud.com/browse/LU-8696  "ls" hangs on a particular directory on production system}, and I tried the approach suggested there,
> 
> 	mds# lctl set_param fail_loc=0x1505
> 
>    but that did not change the "ls" behavior, even with the client rebooted.
> 
> 
> * My file system has 2 MDS (forming 1 HA pair) and 6 OSS (forming 3 HA pairs).
> 
> 	- For diagnostics, I run with HA inactive and instead all OSD mounted manually, on one node per HA pair, respectively.
> 
> 	- "lfs df" outputs are below, for both blocks and inodes.
> 
> 	- All OSDs are connected over multipath.  (*)
> 
> 	- 2 OSTs are "large", at 31 TB (28.9 TiB). (**)
> 
> 	(*) and (**):  It appears I hit "LU-10510 blk_cloned_rq_check_limits: over max size limit.", which went away after I issued the following for each "large" OSD (> 16 TB):
> 
> 	    echo 16384 > /sys/block/$i/queue/max_sectors_kb
> 
> 	I did that manually because "l_tunedisk /dev/foo" from /etc/udev/rules.d/99-lustre-server.rules  did not appear to have an effect.
> 
> 
> * The FS was formatted under 2.10.7.  I hoped from LU-8696/LU-10237 that upgrading the servers, a test client, and thus "lfsck" to 2.12 would help, but it evidently didn't.
> 
> * I ran the following diagnostics so far:
> 
>  (1) e2fsck (from e2fsprogs-1.44.5.wc1-0.el7.x86_64) on all OSD, which found and fixed some errors of the following (trivial?) kind, about a dozen per OSD:
> 
> 	[QUOTA WARNING] Usage inconsistent for ID 22xxxx:actual (68884811776, 21451) != expected (68884819968, 21451)
> 	Inode 10119785 extent tree (at level 1) could be shorter.  Optimize? yes
> 	Inode 10121783 extent tree (at level 1) could be narrower.  Optimize? yes
> 
>  (2) lfsck.  This, to confound matters, gets stuck as well:
> 
>    While in scanning-phase1, after 4 and 9 minutes, respectively, "last_checkpoint_time" and "current_position" no longer advance for both MDTs of the file system. They get stuck:
> 
>    for MDT0000 at:
> 	current_position: 20457226, [0x240000418:0xcf75:0x0], 0x9dc22912f913dbf 
> 
>    for MDT0001 at:
> 	current_position: 27846673, [0x200000419:0x157df:0x0], 0x7e6aec6f046e890d
> 
>    I append the full "lctl get_param" outputs.
> 
> 
> With best regards,
> --
> Michael Sternberg, Ph.D.
> Principal Scientific Computing Administrator Center for Nanoscale Materials Argonne National Laboratory
> 
> 
> Details:
> 
> client# lfs df /home
> UUID                   1K-blocks        Used   Available Use% Mounted on
> carbonfs-MDT0000_UUID    28426640     4888312    20423852  20% /home[MDT:0]
> carbonfs-MDT0001_UUID    28426640     4476696    20835468  18% /home[MDT:1]
> carbonfs-OST0000_UUID 15498310640  5746106888  8970807784  40% /home[OST:0]
> carbonfs-OST0001_UUID 15498310640  6418661584  8298253088  44% /home[OST:1]
> carbonfs-OST0002_UUID 15498310640  6094074632  8622840040  42% /home[OST:2]
> carbonfs-OST0003_UUID 15498310640  6561987336  8154927336  45% /home[OST:3]
> carbonfs-OST0004_UUID 15498310640  6708444352  8008470320  46% /home[OST:4]
> carbonfs-OST0005_UUID 15498310640  6346698352  8370216320  44% /home[OST:5]
> carbonfs-OST0006_UUID 15498310640  6407076140  8309838532  44% /home[OST:6]
> carbonfs-OST0007_UUID 15498310640  6085008472  8631906200  42% /home[OST:7]
> carbonfs-OST0008_UUID 30996858600 11440083316 17993977716  39% /home[OST:8]
> carbonfs-OST0009_UUID 30996858600 12218084364 17215976668  42% /home[OST:9]
> 
> filesystem_summary:  185980202320 74026225436 102577214004  42% /home
> 
> client# lfs df -i /home
> UUID                      Inodes       IUsed       IFree IUse% Mounted on
> carbonfs-MDT0000_UUID    31154760    12356024    18798736  40% /home[MDT:0]
> carbonfs-MDT0001_UUID    31154760    10709228    20445532  35% /home[MDT:1]
> carbonfs-OST0000_UUID    12399920     2502317     9897603  21% /home[OST:0]
> carbonfs-OST0001_UUID    12399920     2437151     9962769  20% /home[OST:1]
> carbonfs-OST0002_UUID    12399920     2479428     9920492  20% /home[OST:2]
> carbonfs-OST0003_UUID    12399920     2424332     9975588  20% /home[OST:3]
> carbonfs-OST0004_UUID    12399920     2405313     9994607  20% /home[OST:4]
> carbonfs-OST0005_UUID    12399920     2461958     9937962  20% /home[OST:5]
> carbonfs-OST0006_UUID    12399920     2422940     9976980  20% /home[OST:6]
> carbonfs-OST0007_UUID    12399920     2471340     9928580  20% /home[OST:7]
> carbonfs-OST0008_UUID    24800048     4637459    20162589  19% /home[OST:8]
> carbonfs-OST0009_UUID    24800048     4574730    20225318  19% /home[OST:9]
> 
> filesystem_summary:     62309520    23065252    39244268  38% /home
> 
> 
> 
> mds# lctl get_param -n mdd.carbonfs-MDT0000.lfsck_namespace 
> name: lfsck_namespace
> magic: 0xa06249ff
> version: 2
> status: scanning-phase1
> flags:
> param: all_targets,create_ostobj,create_mdtobj
> last_completed_time: N/A
> time_since_last_completed: N/A
> latest_start_time: 1560551417
> time_since_latest_start: 2926 seconds
> last_checkpoint_time: 1560551957
> time_since_last_checkpoint: 2386 seconds
> latest_start_position: 77, N/A, N/A
> last_checkpoint_position: 27483017, N/A, N/A
> first_failure_position: N/A, N/A, N/A
> checked_phase1: 12194065
> checked_phase2: 0
> updated_phase1: 0
> updated_phase2: 0
> failed_phase1: 0
> failed_phase2: 0
> directories: 782291
> dirent_repaired: 0
> linkea_repaired: 0
> nlinks_repaired: 0
> multiple_linked_checked: 1486515
> multiple_linked_repaired: 0
> unknown_inconsistency: 0
> unmatched_pairs_repaired: 0
> dangling_repaired: 0
> multiple_referenced_repaired: 0
> bad_file_type_repaired: 0
> lost_dirent_repaired: 0
> local_lost_found_scanned: 0
> local_lost_found_moved: 0
> local_lost_found_skipped: 0
> local_lost_found_failed: 0
> striped_dirs_scanned: 0
> striped_dirs_repaired: 0
> striped_dirs_failed: 0
> striped_dirs_disabled: 0
> striped_dirs_skipped: 0
> striped_shards_scanned: 0
> striped_shards_repaired: 0
> striped_shards_failed: 0
> striped_shards_skipped: 0
> name_hash_repaired: 0
> linkea_overflow_cleared: 0
> agent_entries_repaired: 0
> success_count: 0
> run_time_phase1: 2925 seconds
> run_time_phase2: 0 seconds
> average_speed_phase1: 4168 items/sec
> average_speed_phase2: N/A
> average_speed_total: 4168 items/sec
> real_time_speed_phase1: 46 items/sec
> real_time_speed_phase2: N/A
> current_position: 27846673, [0x200000419:0x157df:0x0], 0x7e6aec6f046e890d
> 
> # lctl get_param -n mdd.carbonfs-MDT0001.lfsck_namespace
> name: lfsck_namespace
> magic: 0xa06249ff
> version: 2
> status: scanning-phase1
> flags:
> param: all_targets,create_ostobj,create_mdtobj
> last_completed_time: N/A
> time_since_last_completed: N/A
> latest_start_time: 1560551417
> time_since_latest_start: 3947 seconds
> last_checkpoint_time: 1560551661
> time_since_last_checkpoint: 3703 seconds
> latest_start_position: 77, N/A, N/A
> last_checkpoint_position: 19924068, [0x240000418:0xbff:0x0], 0xac6ef906919f08
> first_failure_position: N/A, N/A, N/A
> checked_phase1: 7205570
> checked_phase2: 0
> updated_phase1: 0
> updated_phase2: 0
> failed_phase1: 0
> failed_phase2: 0
> directories: 508466
> dirent_repaired: 0
> linkea_repaired: 0
> nlinks_repaired: 0
> multiple_linked_checked: 302727
> multiple_linked_repaired: 0
> unknown_inconsistency: 0
> unmatched_pairs_repaired: 0
> dangling_repaired: 0
> multiple_referenced_repaired: 0
> bad_file_type_repaired: 0
> lost_dirent_repaired: 0
> local_lost_found_scanned: 0
> local_lost_found_moved: 0
> local_lost_found_skipped: 0
> local_lost_found_failed: 0
> striped_dirs_scanned: 0
> striped_dirs_repaired: 0
> striped_dirs_failed: 0
> striped_dirs_disabled: 0
> striped_dirs_skipped: 0
> striped_shards_scanned: 0
> striped_shards_repaired: 0
> striped_shards_failed: 0
> striped_shards_skipped: 0
> name_hash_repaired: 0
> linkea_overflow_cleared: 0
> agent_entries_repaired: 0
> success_count: 0
> run_time_phase1: 3946 seconds
> run_time_phase2: 0 seconds
> average_speed_phase1: 1826 items/sec
> average_speed_phase2: N/A
> average_speed_total: 1826 items/sec
> real_time_speed_phase1: 45 items/sec
> real_time_speed_phase2: N/A
> current_position: 20457226, [0x240000418:0xcf75:0x0], 0x9dc22912f913dbf
> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org