[lustre-discuss] Lustre-2.10.6 frequently hangs during OST data migration

Tung-Han Hsieh thhsieh at twcp1.phys.ntu.edu.tw
Fri Mar 29 20:47:35 PDT 2019


Dear All,

Our system was recently upgraded to lustre-2.10.6. We are doing the
data migration from some almost full OSTs to a newly installed file
server. But we often encountered file system freezed for about 30 secs,
and then returned to normal (within 5 mins it may happen several times).

Our procedure is following.

1. In the MDS, we prevented data writing to the OSTs which are almost full:
   echo 0 > /proc/fs/lustre/osc/chome-OST0000-osc-MDT0000/max_create_count
   echo 0 > /proc/fs/lustre/osc/chome-OST0001-osc-MDT0000/max_create_count
   echo 0 > /proc/fs/lustre/osc/chome-OST0002-osc-MDT0000/max_create_count
   ....

2. In our system we have 40 OSTs, in which 36 OSTs are almost full so they
   are all marked by the above command. Our total OST size is 286TB. We are
   moving out part of their data to the remaining 4 new OSTs by the following
   standard way:

   cp -a /path/to/data /path/to/data.tmp
   mv /path/to/data.tmp /path/to/data

In the beginning, everything looks smoothly. But after one week of running,
the progress becoming slower and slower. Then we found that the file system
often got freezed for a while when the data migration is running. However,
there is almost no any loading in the whole system.

It is also strange that during the past week, we did not see any logs in
'dmesg' messages of MDT, OSTs, and the client. Until last night, only MDT
prompted up these 'dmesg' messages:

==============================================================================
[410649.811086] LNet: Service thread pid 3516 was inactive for 200.27s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[410649.811175] Pid: 3516, comm: mdt00_003
[410649.811201]
[410649.811202] Call Trace:
[410649.811250]  [<ffffffff81033625>] ? check_preempt_curr+0x75/0xa0
[410649.811278]  [<ffffffff8103366b>] ? ttwu_do_wakeup+0x1b/0xa0
[410649.811306]  [<ffffffff810376c1>] ? ttwu_do_activate.constprop.160+0x61/0x70
[410649.811336]  [<ffffffff8103c40a>] ? try_to_wake_up+0x1da/0x280
[410649.811367]  [<ffffffff814106ba>] schedule+0x3a/0x50
[410649.811393]  [<ffffffff81410a95>] schedule_timeout+0x145/0x210
[410649.811421]  [<ffffffff8104d090>] ? process_timeout+0x0/0x10
[410649.811451]  [<ffffffffa0b6dc88>] osp_precreate_reserve+0x328/0x8b0 [osp]
[410649.811484]  [<ffffffffa014f026>] ? do_get_write_access+0x396/0x4d0 [jbd2]
[410649.811515]  [<ffffffff81112a10>] ? __getblk+0x20/0x2e0
[410649.811542]  [<ffffffff8103c4b0>] ? default_wake_function+0x0/0x10
[410649.811571]  [<ffffffffa0b64759>] osp_declare_create+0x1a9/0x680 [osp]
[410649.811603]  [<ffffffffa0ab2a10>] lod_sub_declare_create+0xe0/0x270 [lod]
[410649.811633]  [<ffffffffa0aabdc7>] lod_qos_declare_object_on+0xc7/0x3d0 [lod]
[410649.811664]  [<ffffffffa0aab7fe>] ? lod_statfs_and_check+0xae/0x5b0 [lod]
[410649.811694]  [<ffffffffa0aacfd4>] lod_alloc_qos.constprop.10+0xe64/0x17b0 [lod]
[410649.811741]  [<ffffffffa01741b0>] ? ldiskfs_map_blocks+0x180/0x1e0 [ldiskfs][410649.811772]  [<ffffffffa0ab07ea>] lod_qos_prep_create+0x12ea/0x2910 [lod]
[410649.811803]  [<ffffffffa07c8c94>] ? qsd_op_begin+0x114/0x4d0 [lquota]
[410649.811833]  [<ffffffffa0ab23f0>] lod_prepare_create+0x2c0/0x410 [lod]
[410649.811863]  [<ffffffffa0aa7ccd>] lod_declare_striped_create+0x10d/0xa50 [lod]
[410649.811908]  [<ffffffffa0aaa9b9>] lod_declare_create+0x1e9/0x5a0 [lod]
[410649.811938]  [<ffffffffa0b1af26>] mdd_declare_create_object_internal+0x116/0x320 [mdd]
[410649.811983]  [<ffffffffa0b00c9c>] mdd_declare_create_object.isra.19+0x3c/0xbb0 [mdd]
[410649.812028]  [<ffffffffa0b00044>] ? mdd_linkea_prepare+0x294/0x590 [mdd]
[410649.812058]  [<ffffffffa0b0f90e>] mdd_create+0x88e/0x27d0 [mdd]
[410649.812088]  [<ffffffffa08173e0>] ? osd_xattr_get+0x80/0x890 [osd_ldiskfs]
[410649.812120]  [<ffffffffa09fd3ff>] mdt_reint_open+0x225f/0x3890 [mdt]
[410649.812158]  [<ffffffffa0431276>] ? null_alloc_rs+0x186/0x340 [ptlrpc]
[410649.812191]  [<ffffffffa02ea3fa>] ? upcall_cache_get_entry+0x29a/0x890 [obdclass]
[410649.812237]  [<ffffffffa02ef409>] ? lu_ucred+0x19/0x30 [obdclass]
[410649.812267]  [<ffffffffa09df7db>] ? ucred_set_jobid+0x5b/0x70 [mdt]
[410649.812297]  [<ffffffffa09f1810>] mdt_reint_rec+0xa0/0x210 [mdt]
[410649.812326]  [<ffffffffa09de91d>] mdt_reint_internal+0x63d/0xa50 [mdt]
[410649.812356]  [<ffffffffa09df07a>] mdt_intent_reint+0x21a/0x430 [mdt]
[410649.812385]  [<ffffffffa09da7ed>] mdt_intent_policy+0x5bd/0xde0 [mdt]
[410649.812418]  [<ffffffffa03aa257>] ldlm_lock_enqueue+0x3a7/0x9c0 [ptlrpc]
[410649.812453]  [<ffffffffa03d28e3>] ldlm_handle_enqueue0+0x9c3/0x1790 [ptlrpc]
[410649.812490]  [<ffffffffa04211c0>] ? req_capsule_client_get+0x10/0x20 [ptlrpc]
[410649.812541]  [<ffffffffa045d16c>] ? tgt_request_preprocess.isra.17+0x25c/0x1250 [ptlrpc]
[410649.812593]  [<ffffffffa044ee95>] ? tgt_lookup_reply+0x35/0x1c0 [ptlrpc]
[410649.812629]  [<ffffffffa045b74d>] tgt_enqueue+0x5d/0x250 [ptlrpc]
[410649.812664]  [<ffffffffa045ed1d>] tgt_request_handle+0x8ad/0x15a0 [ptlrpc]
[410649.812701]  [<ffffffffa03f84a4>] ? lustre_msg_get_transno+0x84/0x100 [ptlrpc]
[410649.812752]  [<ffffffffa04084e1>] ptlrpc_main+0x1051/0x2a40 [ptlrpc]
[410649.812780]  [<ffffffff8140ff44>] ? __schedule+0x294/0x940
[410649.812815]  [<ffffffffa0407490>] ? ptlrpc_main+0x0/0x2a40 [ptlrpc]
[410649.812844]  [<ffffffff8105c817>] kthread+0x87/0x90
[410649.812870]  [<ffffffff81413c34>] kernel_thread_helper+0x4/0x10
[410649.812898]  [<ffffffff8105c790>] ? kthread+0x0/0x90
[410649.812923]  [<ffffffff81413c30>] ? kernel_thread_helper+0x0/0x10
[410649.812950]
[410649.812970] LustreError: dumping log to /tmp/lustre-log.1553888141.3516
[410649.818730] wanted to write 3985 but wrote 3518
[410749.275270] LNet: Service thread pid 3516 completed after 299.99s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
==============================================================================

Our lustre-2.10.6 was compiled with sles11sp3 kernel

	linux-3.0.101-138.gcdbe806

using ldiskfs backend. The hardware spec of our MDS is:

CPU: Intel Xeon E5640 @ 2.67GHz (single CPU)
RAM: 8 GB
MGS: 1GB (under RAID 1)
MDT: 230GB (under RAID 1)
RAID controller: LSI ServeRAID M1015 SAS/SATA Controller


Is there any suggestion to fix this problem ?

Thank you very much.


T.H.Hsieh


More information about the lustre-discuss mailing list