[lustre-discuss] MDT runs orph_cleanup_ch forever

Sun Aug 9 02:59:48 PDT 2020

Dear All,

We tried further debug on the problem of "orph_cleanup_ch" process
running forever in the MDT server. We found that, even when we only
mount the MGS+MDT partition (MGS and MDT are in the same partition,
without mounting OST and client at all), the "orph_cleanup_ch" runs
immediately and never stop.

We also tried to follow the instructions of "Lustre Operation Manual"
to backup MDT (in ldiskfs level) and restore it to another newly
created MGS+MDT partition. But it does not help at all. As long as
the restored new MGS+MDT partition was mountedi (without mounting
OST and clients), the "orph_cleanup_ch" starts and never stop.

We even tried to restore all files of MGS+MDT except /ROOT directory
(which contains the file system tree structure of user data). Then
after mounting the MGS+MDT, again "orph_cleanup_ch" starts and never
stop.

We suspect that the data inside MGS+MDT must went wrong, which might
be related to the list of OSTs. In the past, this file system contains:

chome-OST0000   (ldiskfs backend)
chome-OST0001   (ldiskfs backend)
chome-OST0002   (ldiskfs backend)
chome-OST0003   (ldiskfs backend)
chome-OST0004   (zfs backend)

During past months, we migrated data files in all the ldiskfs backend
OSTs to the last one, and verified that the migration were successful.
But in the last step of removing OST permanently:

	lctl conf_param chome-OST0001.osc.active=0

the MGS+MDT server hung and got crashed. We have no choice to reboot it.
It seems that from that event the problem appeared, which caused serious
system problems for the "orph_cleanup_ch" in CPU stall state, together
with the following error message:

> LustreError: 3949:0:(lod_lov.c:1066:validate_lod_and_idx()) chome-MDT0000-mdtlov: bad idx: 2 of 32
> [20940.114011] LustreError: 3949:0:(lod_lov.c:1066:validate_lod_and_idx()) Skipped 71754441 previous similar messages

and so far we have no way to fix it.

Since now the lustre file system only has one OST: chome-OST0004, is it
possible to recreate the correct record for the OST manually ? We would
like to try whether this problem could be fixed in this way.

Thanks very much.

T.H.Hsieh

On Sat, Aug 08, 2020 at 05:36:39PM +0800, Tung-Han Hsieh wrote:
> Dear All,
> 
> We found additional error message in dmesg of MDT server:
> 
> LustreError: 3949:0:(lod_lov.c:1066:validate_lod_and_idx()) chome-MDT0000-mdtlov: bad idx: 2 of 32
> [20940.114011] LustreError: 3949:0:(lod_lov.c:1066:validate_lod_and_idx()) Skipped 71754441 previous similar messages
> 
> I am not sure whether it caused the indefinitely running of the
> "orph_cleanup_ch" process in MDT. Is there any way to fix it ?
> 
> (now it has run more than 6.5 hours, and is still running)
> 
> Thanks very much.
> 
> T.H.Hsieh
> 
> On Sat, Aug 08, 2020 at 03:44:18PM +0800, Tung-Han Hsieh wrote:
> > Dear All,
> > 
> > We have a running Lustre file system with version 2.10.7. The MDT
> > server runs Linux kernel 3.0.101, and MDT is using ldiskfs backend
> > with patched Linux kernel.
> > 
> > Today our MDT server crashed and needed cold reboot. In other words,
> > the Lustre MDT was not cleanly unmounted before reboot. After reboot,
> > and mounted the MDT partition, we found that it has the "orph_cleanup_ch"
> > process running indefinitely. Up to now, it already ran more than
> > 4 hours. It took 100% usage of CPU (one CPU core), and leaded system
> > lock with a lot of the following dmesg messages:
> > 
> > [16240.692491] INFO: rcu_sched_state detected stall on CPU 2 (t=3167100 jiffies)
> > [16240.692524] Pid: 3949, comm: orph_cleanup_ch Not tainted 3.0.101 #1
> > [16240.692551] Call Trace:
> > [16240.692572]  <IRQ>  [<ffffffff8109aad8>] ? __rcu_pending+0x258/0x460
> > [16240.692608]  [<ffffffff8109b769>] ? rcu_check_callbacks+0x69/0x130
> > [16240.692637]  [<ffffffff8104f046>] ? update_process_times+0x46/0x80
> > [16240.692668]  [<ffffffff8106ddc8>] ? tick_sched_timer+0x58/0xa0
> > [16240.692697]  [<ffffffff81061f6c>] ? __run_hrtimer.isra.34+0x3c/0xd0
> > [16240.692726]  [<ffffffff810625af>] ? hrtimer_interrupt+0xdf/0x230
> > [16240.692756]  [<ffffffff81020ff7>] ? smp_apic_timer_interrupt+0x67/0xa0
> > [16240.692791]  [<ffffffff81443f13>] ? apic_timer_interrupt+0x13/0x20
> > [16240.692818]  <EOI>  [<ffffffffa0286801>] ? __ldiskfs_check_dir_entry+0xb1/0x1
> > d0 [ldiskfs]
> > [16240.692873]  [<ffffffffa02871d3>] ? ldiskfs_htree_store_dirent+0x133/0x190 [ldiskfs]
> > [16240.692920]  [<ffffffffa02692a5>] ? htree_dirblock_to_tree+0xc5/0x170 [ldiskfs]
> > [16240.692966]  [<ffffffffa026dd41>] ? ldiskfs_htree_fill_tree+0x171/0x220 [ldiskfs]
> > [16240.693012]  [<ffffffffa0286a77>] ? ldiskfs_readdir+0x157/0x760 [ldiskfs]
> > [16240.693054]  [<ffffffffa0569b3c>] ? top_trans_stop+0x13c/0xaa0 [ptlrpc]
> > [16240.693084]  [<ffffffffa0942c40>] ? osd_it_ea_next+0x190/0x190 [osd_ldiskfs]
> > [16240.693116]  [<ffffffffa02963fe>] ? htree_lock_try+0x3e/0x80 [ldiskfs]
> > [16240.693146]  [<ffffffffa0942842>] ? osd_ldiskfs_it_fill+0xa2/0x220 [osd_ldiskfs]
> > [16240.693191]  [<ffffffffa0942b66>] ? osd_it_ea_next+0xb6/0x190 [osd_ldiskfs]
> > [16240.693222]  [<ffffffffa0b188ac>] ? lod_it_next+0x1c/0x90 [lod]
> > [16240.693251]  [<ffffffffa0b871fa>] ? __mdd_orphan_cleanup+0x33a/0x1770 [mdd]
> > [16240.693281]  [<ffffffff81039b1d>] ? default_wake_function+0xd/0x10
> > [16240.693310]  [<ffffffffa0b86ec0>] ? orph_declare_index_delete+0x6b0/0x6b0 [mdd]
> > [16240.693354]  [<ffffffffa0b86ec0>] ? orph_declare_index_delete+0x6b0/0x6b0 [mdd]
> > [16240.693398]  [<ffffffff8105e039>] ? kthread+0x99/0xa0
> > [16240.693425]  [<ffffffff81444674>] ? kernel_thread_helper+0x4/0x10
> > [16240.693453]  [<ffffffff8105dfa0>] ? kthread_flush_work_fn+0x10/0x10
> > [16240.693480]  [<ffffffff81444670>] ? gs_change+0xb/0xb
> > 
> > 
> > We guess that this process is trying to do consistant check for the
> > MDT partition since it did not cleanly unmount when system cold reboot.
> > Although the whole file system looks normal, i.e., we can mount the
> > clients, but we are wondering whether the process could eventually
> > complete the work or not. Otherwise the operating system of MDT is
> > always locked by this process, which leads to every works abnormal
> > (e.g., the "df" cmmand hangs forever, the systemd process is also
> > locked in "D" state, and ssh login, nis seems abnormal ....).
> > 
> > Any suggestions to fix this problem is very appreciated.
> > 
> > Thank you very much.
> > 
> > Best Regards,
> > 
> > T.H.Hsieh
> >