[lustre-discuss] Unresponsiveness of OSS and Directory Listing Hang-up

Jane Liu zhliu at rcf.rhic.bnl.gov
Thu May 18 13:44:25 PDT 2023


Hi,

We have recently upgraded our Lustre servers to run on RHEL 8.7, along 
with Lustre 2.15.2. Despite running smoothly for several weeks, we have 
encountered an issue that is same as the one reported on this webpage: 
https://jira.whamcloud.com/browse/LU-10697. Although the Lustre version 
described there differs from ours, the symptoms are identical.

“uname -a” on our system returns the following output: 
4.18.0-425.3.1.el8_lustre.x86_64 #1 SMP Wed Jan 11 23:55:00 UTC 2023 
x86_64 x86_64 x86_64 GNU/Linux. And the content of /etc/redhat-release 
is: Red Hat Enterprise Linux release 8.7 (Ootpa)

Here are the details about the issue.

On 5/16, around 4:47 am, one OSS named oss29 began experiencing 
problems. There was a rapid increase in the number of active requests, 
from 1 to 123, occurring from roughly 4:47 am to 10:20 am. At around 
5/16 10:20 am, I/O on oss29 stopped entirely, and the number of active 
requests remained at 123. Concurrently, the system load experienced a 
significant increase, shooting up from a very low number to a high 
number as 400, again within the timeframe of 4:47 am to 10:20 am. 
Interestingly, despite the extreme system load, the CPU usage remained 
idle.

Furthermore, when executing the lfs df command on the MDS, no OSTs on 
oss29 were visible.

We noticed a lot of error about “This server is not able to keep up with 
request traffic (cpu-bound)” in syslog on oss29:

May 16 05:44:52 oss29 kernel: Lustre: ost_io: This server is not able to 
keep up with request traffic (cpu-bound).
May 16 06:13:49 oss29 kernel: Lustre: ost_io: This server is not able to 
keep up with request traffic (cpu-bound).
May 16 06:23:39 oss29 kernel: Lustre: ost_io: This server is not able to 
keep up with request traffic (cpu-bound).
May 16 06:32:56 oss29 kernel: Lustre: ost_io: This server is not able to 
keep up with request traffic (cpu-bound).
May 16 06:42:46 oss29 kernel: Lustre: ost_io: This server is not able to 
keep up with request traffic (cpu-bound).
…

At the same time, users also reported an issue with a specific 
directory, which became inaccessible. Running ls -l on this directory 
resulted in a hang, while the ls command worked. Users found they could 
read certain files within the directory, but not all of them.

In an attempt to fix the situation, I tried to unmount the OSTs on 
oss29, but this was unsuccessful. We then made the decision to reboot 
oss29 on 5/17 around 3:30 pm. However, upon the system's return, oss29 
immediately reverted to its previous unresponsive state with high load. 
Listing the directory still hung.

I did lfsck on MDT, but it just hung there.

Here are related MDS syslog during the perios:

May 16 05:09:13 sphnxmds01 kernel: Lustre: sphnx01-OST0192-osc-MDT0000: 
Connection to sphnx01-OST0192 (at 10.42.73.42 at tcp) was lost; in progress 
operations using this service will wait for recovery to complete
May 16 05:09:13 sphnxmds01 kernel: LustreError: 
384795:0:(osp_precreate.c:677:osp_precreate_send()) 
sphnx01-OST0192-osc-MDT0000: can't precreate: rc = -11
May 16 05:09:13 sphnxmds01 kernel: Lustre: sphnx01-OST0192-osc-MDT0000: 
Connection restored to 10.42.73.42 at tcp (at 10.42.73.42 at tcp)
May 16 05:09:13 sphnxmds01 kernel: LustreError: 
384795:0:(osp_precreate.c:1340:osp_precreate_thread()) 
sphnx01-OST0192-osc-MDT0000: cannot precreate objects: rc = -11

…
May 16 05:22:17 sphnxmds01 kernel: Lustre: sphnx01-OST0192-osc-MDT0000: 
Connection to sphnx01-OST0192 (at 10.42.73.42 at tcp) was lost; in progress 
operations using this service will wait for recovery to complete
May 16 05:22:17 sphnxmds01 kernel: LustreError: 
384795:0:(osp_precreate.c:967:osp_precreate_cleanup_orphans()) 
sphnx01-OST0192-osc-MDT0000: cannot cleanup orphans: rc = -11
May 16 05:22:17 sphnxmds01 kernel: Lustre: sphnx01-OST0192-osc-MDT0000: 
Connection restored to 10.42.73.42 at tcp (at 10.42.73.42 at tcp)

And this is OSS syslog:

May 16 04:47:21 oss29 kernel: Lustre: sphnx01-OST0192: Export 
000000009a00357a already connecting from 130.199.206.80 at tcp
May 16 04:47:26 oss29 kernel: Lustre: sphnx01-OST0192: Export 
000000009a00357a already connecting from 130.199.206.80 at tcp
May 16 04:47:31 oss29 kernel: Lustre: sphnx01-OST0192: Export 
0000000012e8b1d0 already connecting from 130.199.48.37 at tcp
May 16 04:47:36 oss29 kernel: Lustre: sphnx01-OST0192: Export 
000000009a00357a already connecting from 130.199.206.80 at tcp
May 16 04:47:36 oss29 kernel: Lustre: Skipped 1 previous similar message
May 16 04:47:41 oss29 kernel: Lustre: sphnx01-OST0192: Export 
000000009a00357a already connecting from 130.199.206.80 at tcp
May 16 04:47:51 oss29 kernel: Lustre: sphnx01-OST0192: Export 
000000009a00357a already connecting from 130.199.206.80 at tcp
May 16 04:47:51 oss29 kernel: Lustre: Skipped 1 previous similar message
May 16 04:48:11 oss29 kernel: Lustre: sphnx01-OST0192: Export 
000000009a00357a already connecting from 130.199.206.80 at tcp
May 16 04:48:11 oss29 kernel: Lustre: Skipped 4 previous similar 
messages
May 16 04:48:43 oss29 kernel: Lustre: sphnx01-OST0192: Export 
000000000f4ce150 already connecting from 130.199.206.154 at tcp
May 16 04:48:43 oss29 kernel: Lustre: Skipped 13 previous similar 
messages
May 16 04:49:43 oss29 kernel: Lustre: ll_ost_io01_102: service thread 
pid 822490 was inactive for 200.303 seconds. The thread m
ight be hung, or it might only be slow and will resume later. Dumping 
the stack trace for debugging purposes:
May 16 04:49:43 oss29 kernel: Pid: 604064, comm: ll_ost_io00_045 
4.18.0-425.3.1.el8_lustre.x86_64 #1 SMP Wed Jan 11 23:55:00 UT
C 2023
May 16 04:49:43 oss29 kernel: Lustre: Skipped 1 previous similar message
May 16 04:49:43 oss29 kernel: Call Trace TBD:
May 16 04:49:43 oss29 kernel: Lustre: ll_ost_io00_013: service thread 
pid 365744 was inactive for 200.187 seconds. Watchdog sta
ck traces are limited to 3 per 300 seconds, skipping this one.
May 16 04:49:43 oss29 kernel: Lustre: Skipped 1 previous similar message
May 16 04:49:43 oss29 kernel: [<0>] md_bitmap_startwrite+0x173/0x1e0
May 16 04:49:43 oss29 kernel: [<0>] add_stripe_bio+0x4b8/0x7e0 [raid456]
May 16 04:49:43 oss29 kernel: [<0>] raid5_make_request+0x218/0xc50 
[raid456]
May 16 04:49:43 oss29 kernel: [<0>] md_handle_request+0x128/0x1b0
May 16 04:49:43 oss29 kernel: [<0>] md_make_request+0x5b/0xb0
May 16 04:49:43 oss29 kernel: [<0>] 
generic_make_request_no_check+0x202/0x330
May 16 04:49:43 oss29 kernel: [<0>] submit_bio+0x3c/0x160
May 16 04:49:43 oss29 kernel: [<0>] osd_do_bio.constprop.51+0x510/0xc40 
[osd_ldiskfs]
May 16 04:49:43 oss29 kernel: [<0>] 
osd_ldiskfs_map_inode_pages+0x873/0x8f0 [osd_ldiskfs]
May 16 04:49:43 oss29 kernel: [<0>] osd_write_commit+0x5e2/0x990 
[osd_ldiskfs]
May 16 04:49:43 oss29 kernel: [<0>] ofd_commitrw_write+0x77e/0x1ad0 
[ofd]
May 16 04:49:43 oss29 kernel: [<0>] ofd_commitrw+0x5b4/0xd20 [ofd]
May 16 04:49:43 oss29 kernel: [<0>] obd_commitrw+0x1b0/0x380 [ptlrpc]
May 16 04:49:43 oss29 kernel: [<0>] tgt_brw_write+0x139f/0x1ce0 [ptlrpc]
May 16 04:49:43 oss29 kernel: [<0>] tgt_request_handle+0xc97/0x1a40 
[ptlrpc]
May 16 04:49:43 oss29 kernel: [<0>] 
ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc]
May 16 04:49:43 oss29 kernel: [<0>] ptlrpc_main+0xc0f/0x1570 [ptlrpc]
May 16 04:49:43 oss29 kernel: [<0>] kthread+0x10a/0x120
May 16 04:49:43 oss29 kernel: [<0>] ret_from_fork+0x1f/0x40
May 16 04:49:43 oss29 kernel: Pid: 333512, comm: ll_ost_io01_002 
4.18.0-425.3.1.el8_lustre.x86_64 #1 SMP Wed Jan 11 23:55:00 UT
C 2023
May 16 04:49:43 oss29 kernel: Call Trace TBD:
May 16 04:49:43 oss29 kernel: [<0>] md_bitmap_startwrite+0x173/0x1e0
May 16 04:49:43 oss29 kernel: [<0>] add_stripe_bio+0x4b8/0x7e0 [raid456]
May 16 04:49:43 oss29 kernel: [<0>] raid5_make_request+0x218/0xc50 
[raid456]
May 16 04:49:43 oss29 kernel: [<0>] md_handle_request+0x128/0x1b0
May 16 04:49:43 oss29 kernel: [<0>] md_make_request+0x5b/0xb0
May 16 04:49:43 oss29 kernel: [<0>] 
generic_make_request_no_check+0x202/0x330
May 16 04:49:43 oss29 kernel: [<0>] submit_bio+0x3c/0x160
May 16 04:49:43 oss29 kernel: [<0>] osd_do_bio.constprop.51+0xb63/0xc40 
[osd_ldiskfs]
May 16 04:49:43 oss29 kernel: [<0>] 
osd_ldiskfs_map_inode_pages+0x873/0x8f0 [osd_ldiskfs]
May 16 04:49:43 oss29 kernel: [<0>] osd_write_commit+0x5e2/0x990 
[osd_ldiskfs]
May 16 04:49:43 oss29 kernel: [<0>] ofd_commitrw_write+0x77e/0x1ad0 
[ofd]
May 16 04:49:43 oss29 kernel: [<0>] ofd_commitrw+0x5b4/0xd20 [ofd]
May 16 04:49:43 oss29 kernel: [<0>] obd_commitrw+0x1b0/0x380 [ptlrpc]
May 16 04:49:43 oss29 kernel: [<0>] tgt_brw_write+0x139f/0x1ce0 [ptlrpc]
May 16 04:49:43 oss29 kernel: [<0>] tgt_request_handle+0xc97/0x1a40 
[ptlrpc]
May 16 04:49:43 oss29 kernel: [<0>] 
ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc]
May 16 04:49:43 oss29 kernel: [<0>] ptlrpc_main+0xc0f/0x1570 [ptlrpc]
May 16 04:49:43 oss29 kernel: [<0>] kthread+0x10a/0x120
May 16 04:49:43 oss29 kernel: [<0>] ret_from_fork+0x1f/0x40
May 16 04:49:43 oss29 kernel: Pid: 822490, comm: ll_ost_io01_102 
4.18.0-425.3.1.el8_lustre.x86_64 #1 SMP Wed Jan 11 23:55:00 UT
C 2023
May 16 04:49:43 oss29 kernel: Call Trace TBD:
May 16 04:49:43 oss29 kernel: [<0>] md_bitmap_startwrite+0x173/0x1e0
May 16 04:49:43 oss29 kernel: [<0>] add_stripe_bio+0x4b8/0x7e0 [raid456]
May 16 04:49:43 oss29 kernel: [<0>] raid5_make_request+0x218/0xc50 
[raid456]
May 16 04:49:43 oss29 kernel: [<0>] md_handle_request+0x128/0x1b0
May 16 04:49:43 oss29 kernel: [<0>] md_make_request+0x5b/0xb0
May 16 04:49:43 oss29 kernel: [<0>] 
generic_make_request_no_check+0x202/0x330
May 16 04:49:43 oss29 kernel: [<0>] submit_bio+0x3c/0x160
May 16 04:49:43 oss29 kernel: [<0>] osd_do_bio.constprop.51+0x510/0xc40 
[osd_ldiskfs]
May 16 04:49:43 oss29 kernel: [<0>] 
osd_ldiskfs_map_inode_pages+0x873/0x8f0 [osd_ldiskfs]
May 16 04:49:43 oss29 kernel: [<0>] osd_write_commit+0x5e2/0x990 
[osd_ldiskfs]
May 16 04:49:43 oss29 kernel: [<0>] ofd_commitrw_write+0x77e/0x1ad0 
[ofd]
May 16 04:49:43 oss29 kernel: [<0>] ofd_commitrw+0x5b4/0xd20 [ofd]
May 16 04:49:43 oss29 kernel: [<0>] obd_commitrw+0x1b0/0x380 [ptlrpc]
May 16 04:49:43 oss29 kernel: [<0>] tgt_brw_write+0x139f/0x1ce0 [ptlrpc]
May 16 04:49:43 oss29 kernel: [<0>] tgt_request_handle+0xc97/0x1a40 
[ptlrpc]
May 16 04:49:43 oss29 kernel: [<0>] 
ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc]
May 16 04:49:43 oss29 kernel: [<0>] ptlrpc_main+0xc0f/0x1570 [ptlrpc]
May 16 04:49:43 oss29 kernel: [<0>] kthread+0x10a/0x120
May 16 04:49:43 oss29 kernel: [<0>] ret_from_fork+0x1f/0x40
May 16 04:49:47 oss29 kernel: Lustre: ll_ost_io00_118: service thread 
pid 809074 was inactive for 204.003 seconds. Watchdog sta
ck traces are limited to 3 per 300 seconds, skipping this one.
May 16 04:49:47 oss29 kernel: Lustre: Skipped 9 previous similar 
messages
May 16 04:49:48 oss29 kernel: Lustre: sphnx01-OST0192: Export 
000000000f4ce150 already connecting from 130.199.206.154 at tcp
May 16 04:49:48 oss29 kernel: Lustre: Skipped 29 previous similar 
messages
May 16 04:49:51 oss29 kernel: Lustre: ll_ost_io01_087: service thread 
pid 822474 was inactive for 200.852 seconds. Watchdog sta
ck traces are limited to 3 per 300 seconds, skipping this one.
May 16 04:49:51 oss29 kernel: Lustre: Skipped 17 previous similar 
messages
May 16 04:50:36 oss29 kernel: Lustre: ll_ost01_000: service thread pid 
333500 was inactive for 201.018 seconds. Watchdog stack
traces are limited to 3 per 300 seconds, skipping this one.
May 16 04:50:36 oss29 kernel: Lustre: Skipped 47 previous similar 
messages
May 16 04:50:36 oss29 kernel: INFO: task ll_ost01_000:333500 blocked for 
more than 120 seconds.
May 16 04:50:36 oss29 kernel:      Tainted: G        W IOE    --------- 
-  - 4.18.0-425.3.1.el8_lustre.x86_64 #1
May 16 04:50:36 oss29 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 16 04:50:36 oss29 kernel: task:ll_ost01_000    state:D stack:    0 
pid:333500 ppid:     2 flags:0x80004080
May 16 04:50:36 oss29 kernel: Call Trace:
May 16 04:50:36 oss29 kernel: __schedule+0x2d1/0x860
May 16 04:50:36 oss29 kernel: schedule+0x35/0xa0
May 16 04:50:36 oss29 kernel: wait_transaction_locked+0x89/0xd0 [jbd2]
May 16 04:50:36 oss29 kernel: ? finish_wait+0x80/0x80
May 16 04:50:36 oss29 kernel: add_transaction_credits+0xd4/0x290 [jbd2]
May 16 04:50:36 oss29 kernel: start_this_handle+0x10a/0x520 [jbd2]
May 16 04:50:36 oss29 kernel: ? jbd2__journal_start+0x8f/0x1f0 [jbd2]
May 16 04:50:36 oss29 kernel: ? kmem_cache_alloc+0x13f/0x280
May 16 04:50:36 oss29 kernel: jbd2__journal_start+0xee/0x1f0 [jbd2]
May 16 04:50:36 oss29 kernel: ? osd_trans_start+0x13b/0x500 
[osd_ldiskfs]
May 16 04:50:36 oss29 kernel: __ldiskfs_journal_start_sb+0x6e/0x140 
[ldiskfs]
May 16 04:50:36 oss29 kernel: osd_trans_start+0x13b/0x500 [osd_ldiskfs]
May 16 04:50:36 oss29 kernel: tgt_client_data_update+0x468/0x6c0 
[ptlrpc]
May 16 04:50:36 oss29 kernel: ? cfs_percpt_lock+0x16/0x100 [libcfs]
May 16 04:50:36 oss29 kernel: tgt_client_new+0x5c2/0x880 [ptlrpc]
May 16 04:50:36 oss29 kernel: ofd_obd_connect+0x385/0x4f0 [ofd]
May 16 04:50:36 oss29 kernel: target_handle_connect+0x611/0x29a0 
[ptlrpc]
May 16 04:50:36 oss29 kernel: ? lock_timer_base+0x60/0x90
May 16 04:50:36 oss29 kernel: ? internal_add_timer+0x42/0x70
May 16 04:50:36 oss29 kernel: tgt_request_handle+0x569/0x1a40 [ptlrpc]
May 16 04:50:36 oss29 kernel: ? ptlrpc_nrs_req_get_nolock0+0xff/0x1f0 
[ptlrpc]
May 16 04:50:36 oss29 kernel: ptlrpc_server_handle_request+0x323/0xbe0 
[ptlrpc]
May 16 04:50:36 oss29 kernel: ptlrpc_main+0xc0f/0x1570 [ptlrpc]
May 16 04:50:36 oss29 kernel: ? __schedule+0x2d9/0x860
May 16 04:50:36 oss29 kernel: ? ptlrpc_wait_event+0x590/0x590 [ptlrpc]
May 16 04:50:36 oss29 kernel: kthread+0x10a/0x120
May 16 04:50:36 oss29 kernel: ? set_kthread_struct+0x50/0x50
May 16 04:50:36 oss29 kernel: ret_from_fork+0x1f/0x40
May 16 04:50:36 oss29 kernel: INFO: task ll_ost_io00_000:333507 blocked 
for more than 120 seconds.
May 16 04:50:36 oss29 kernel:      Tainted: G        W IOE    --------- 
-  - 4.18.0-425.3.1.el8_lustre.x86_64 #1
May 16 04:50:36 oss29 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 16 04:50:36 oss29 kernel: task:ll_ost_io00_000 state:D stack:    0 
pid:333507 ppid:     2 flags:0x80004080
May 16 04:50:36 oss29 kernel: Call Trace:
May 16 04:50:36 oss29 kernel: __schedule+0x2d1/0x860
May 16 04:50:36 oss29 kernel: ? blk_flush_plug_list+0xc2/0x100
May 16 04:50:36 oss29 kernel: ? finish_wait+0x80/0x80
May 16 04:50:36 oss29 kernel: schedule+0x35/0xa0
May 16 04:50:36 oss29 kernel: md_bitmap_startwrite+0x173/0x1e0
May 16 04:50:36 oss29 kernel: ? finish_wait+0x80/0x80
May 16 04:50:36 oss29 kernel: add_stripe_bio+0x4b8/0x7e0 [raid456]
May 16 04:50:36 oss29 kernel: raid5_make_request+0x218/0xc50 [raid456]
May 16 04:50:36 oss29 kernel: ? generic_make_request_checks+0x2a5/0x4f0
May 16 04:50:36 oss29 kernel: ? __bio_clone+0x60/0xe0
May 16 04:50:36 oss29 kernel: ? finish_wait+0x80/0x80
May 16 04:50:36 oss29 kernel: ? blk_queue_split+0xd4/0x670
May 16 04:50:36 oss29 kernel: ? finish_wait+0x80/0x80
May 16 04:50:36 oss29 kernel: md_handle_request+0x128/0x1b0
May 16 04:50:36 oss29 kernel: md_make_request+0x5b/0xb0
May 16 04:50:36 oss29 kernel: generic_make_request_no_check+0x202/0x330
May 16 04:50:36 oss29 kernel: submit_bio+0x3c/0x160
May 16 04:50:36 oss29 kernel: osd_do_bio.constprop.51+0x510/0xc40 
[osd_ldiskfs]
May 16 04:50:36 oss29 kernel: ? ldiskfs_map_blocks+0x607/0x610 [ldiskfs]
May 16 04:50:36 oss29 kernel: ? osd_ldiskfs_map_inode_pages+0x873/0x8f0 
[osd_ldiskfs]
May 16 04:50:36 oss29 kernel: osd_ldiskfs_map_inode_pages+0x873/0x8f0 
[osd_ldiskfs]
May 16 04:50:36 oss29 kernel: osd_write_commit+0x5e2/0x990 [osd_ldiskfs]
May 16 04:50:36 oss29 kernel: ofd_commitrw_write+0x77e/0x1ad0 [ofd]
May 16 04:50:36 oss29 kernel: ? cfs_crypto_hash_final+0x63/0x90 [libcfs]
May 16 04:50:36 oss29 kernel: ? kfree+0xd0/0x250
May 16 04:50:36 oss29 kernel: ofd_commitrw+0x5b4/0xd20 [ofd]
May 16 04:50:36 oss29 kernel: ? obd_commitrw+0x1b0/0x380 [ptlrpc]
May 16 04:50:36 oss29 kernel: obd_commitrw+0x1b0/0x380 [ptlrpc]
May 16 04:50:36 oss29 kernel: tgt_brw_write+0x139f/0x1ce0 [ptlrpc]
May 16 04:50:36 oss29 kernel: ? flush_work+0x42/0x1d0
May 16 04:50:36 oss29 kernel: ? internal_add_timer+0x42/0x70
May 16 04:50:36 oss29 kernel: ? _cond_resched+0x15/0x30
May 16 04:50:36 oss29 kernel: ? mutex_lock+0xe/0x30
May 16 04:50:36 oss29 kernel: tgt_request_handle+0xc97/0x1a40 [ptlrpc]
May 16 04:50:36 oss29 kernel: ? ptlrpc_nrs_req_get_nolock0+0xff/0x1f0 
[ptlrpc]
May 16 04:50:36 oss29 kernel: ptlrpc_server_handle_request+0x323/0xbe0 
[ptlrpc]
May 16 04:50:36 oss29 kernel: ? finish_wait+0x80/0x80
May 16 04:50:36 oss29 kernel: ptlrpc_main+0xc0f/0x1570 [ptlrpc]
May 16 04:50:36 oss29 kernel: ? ptlrpc_wait_event+0x590/0x590 [ptlrpc]
May 16 04:50:36 oss29 kernel: kthread+0x10a/0x120
May 16 04:50:36 oss29 kernel: ? set_kthread_struct+0x50/0x50
May 16 04:50:36 oss29 kernel: ret_from_fork+0x1f/0x40
May 16 04:50:36 oss29 kernel: INFO: task ll_ost_io00_001:333508 blocked 
for more than 120 seconds.
May 16 04:50:36 oss29 kernel:      Tainted: G        W IOE    --------- 
-  - 4.18.0-425.3.1.el8_lustre.x86_64 #1
May 16 04:50:36 oss29 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 16 04:50:36 oss29 kernel: task:ll_ost_io00_001 state:D stack:    0 
pid:333508 ppid:     2 flags:0x80004080
May 16 04:50:36 oss29 kernel: Call Trace:
May 16 04:50:36 oss29 kernel: __schedule+0x2d1/0x860
May 16 04:50:36 oss29 kernel: ? finish_wait+0x80/0x80
May 16 04:50:36 oss29 kernel: schedule+0x35/0xa0
May 16 04:50:36 oss29 kernel: md_bitmap_startwrite+0x173/0x1e0
May 16 04:50:36 oss29 kernel: ? finish_wait+0x80/0x80
May 16 04:50:36 oss29 kernel: add_stripe_bio+0x4b8/0x7e0 [raid456]
May 16 04:50:36 oss29 kernel: raid5_make_request+0x218/0xc50 [raid456]
May 16 04:50:36 oss29 kernel: ? generic_make_request_checks+0x2a5/0x4f0
May 16 04:50:36 oss29 kernel: ? __bio_clone+0x60/0xe0
May 16 04:50:36 oss29 kernel: ? finish_wait+0x80/0x80
May 16 04:50:36 oss29 kernel: ? blk_queue_split+0xd4/0x670
May 16 04:50:36 oss29 kernel: ? finish_wait+0x80/0x80
May 16 04:50:36 oss29 kernel: md_handle_request+0x128/0x1b0
May 16 04:50:36 oss29 kernel: md_make_request+0x5b/0xb0
May 16 04:50:36 oss29 kernel: generic_make_request_no_check+0x202/0x330
May 16 04:50:36 oss29 kernel: submit_bio+0x3c/0x160
May 16 04:50:36 oss29 kernel: osd_do_bio.constprop.51+0x510/0xc40 
[osd_ldiskfs]
May 16 04:50:36 oss29 kernel: ? __ldiskfs_journal_stop+0x36/0xb0 
[ldiskfs]
May 16 04:50:36 oss29 kernel: ? osd_ldiskfs_map_inode_pages+0x873/0x8f0 
[osd_ldiskfs]
May 16 04:50:36 oss29 kernel: osd_ldiskfs_map_inode_pages+0x873/0x8f0 
[osd_ldiskfs]
May 16 04:50:36 oss29 kernel: osd_write_commit+0x5e2/0x990 [osd_ldiskfs]
May 16 04:50:36 oss29 kernel: ofd_commitrw_write+0x77e/0x1ad0 [ofd]
May 16 04:50:36 oss29 kernel: ? cfs_crypto_hash_final+0x63/0x90 [libcfs]
May 16 04:50:36 oss29 kernel: ? kfree+0xd0/0x250
May 16 04:50:36 oss29 kernel: ofd_commitrw+0x5b4/0xd20 [ofd]
May 16 04:50:36 oss29 kernel: ? obd_commitrw+0x1b0/0x380 [ptlrpc]
May 16 04:50:36 oss29 kernel: obd_commitrw+0x1b0/0x380 [ptlrpc]
May 16 04:50:36 oss29 kernel: tgt_brw_write+0x139f/0x1ce0 [ptlrpc]
May 16 04:50:36 oss29 kernel: ? flush_work+0x42/0x1d0
May 16 04:50:36 oss29 kernel: ? internal_add_timer+0x42/0x70
May 16 04:50:36 oss29 kernel: ? _cond_resched+0x15/0x30
May 16 04:50:36 oss29 kernel: ? mutex_lock+0xe/0x30
May 16 04:50:36 oss29 kernel: tgt_request_handle+0xc97/0x1a40 [ptlrpc]
May 16 04:50:36 oss29 kernel: ? ptlrpc_nrs_req_get_nolock0+0xff/0x1f0 
[ptlrpc]
May 16 04:50:36 oss29 kernel: ptlrpc_server_handle_request+0x323/0xbe0 
[ptlrpc]
May 16 04:50:36 oss29 kernel: ptlrpc_main+0xc0f/0x1570 [ptlrpc]
May 16 04:50:36 oss29 kernel: ? ptlrpc_wait_event+0x590/0x590 [ptlrpc]
May 16 04:50:36 oss29 kernel: kthread+0x10a/0x120
May 16 04:50:36 oss29 kernel: ? set_kthread_struct+0x50/0x50
May 16 04:50:36 oss29 kernel: ret_from_fork+0x1f/0x40
May 16 04:50:36 oss29 kernel: INFO: task ll_ost_io00_002:333509 blocked 
for more than 120 seconds.
May 16 04:50:36 oss29 kernel:      Tainted: G        W IOE    --------- 
-  - 4.18.0-425.3.1.el8_lustre.x86_64 #1
May 16 04:50:36 oss29 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 16 04:50:36 oss29 kernel: task:ll_ost_io00_002 state:D stack:    0 
pid:333509 ppid:     2 flags:0x80004080
May 16 04:50:36 oss29 kernel: Call Trace:
May 16 04:50:36 oss29 kernel: __schedule+0x2d1/0x860
May 16 04:50:36 oss29 kernel: ? finish_wait+0x80/0x80
May 16 04:50:36 oss29 kernel: schedule+0x35/0xa0
May 16 04:50:36 oss29 kernel: md_bitmap_startwrite+0x173/0x1e0
May 16 04:50:36 oss29 kernel: ? finish_wait+0x80/0x80
May 16 04:50:36 oss29 kernel: add_stripe_bio+0x4b8/0x7e0 [raid456]
May 16 04:50:36 oss29 kernel: raid5_make_request+0x218/0xc50 [raid456]
May 16 04:50:36 oss29 kernel: ? generic_make_request_checks+0x2a5/0x4f0
May 16 04:50:36 oss29 kernel: ? __bio_clone+0x60/0xe0
May 16 04:50:36 oss29 kernel: ? finish_wait+0x80/0x80
May 16 04:50:36 oss29 kernel: ? blk_queue_split+0xd4/0x670
May 16 04:50:36 oss29 kernel: ? finish_wait+0x80/0x80
May 16 04:50:36 oss29 kernel: md_handle_request+0x128/0x1b0
May 16 04:50:36 oss29 kernel: md_make_request+0x5b/0xb0
May 16 04:50:36 oss29 kernel: generic_make_request_no_check+0x202/0x330
May 16 04:50:36 oss29 kernel: submit_bio+0x3c/0x160
May 16 04:50:36 oss29 kernel: osd_do_bio.constprop.51+0xb63/0xc40 
[osd_ldiskfs]
May 16 04:50:36 oss29 kernel: ? ldiskfs_map_blocks+0x607/0x610 [ldiskfs]
May 16 04:50:36 oss29 kernel: ? osd_ldiskfs_map_inode_pages+0x873/0x8f0 
[osd_ldiskfs]
May 16 04:50:36 oss29 kernel: osd_ldiskfs_map_inode_pages+0x873/0x8f0 
[osd_ldiskfs]
May 16 04:50:36 oss29 kernel: osd_write_commit+0x5e2/0x990 [osd_ldiskfs]
May 16 04:50:36 oss29 kernel: ofd_commitrw_write+0x77e/0x1ad0 [ofd]
May 16 04:50:36 oss29 kernel: ? cfs_crypto_hash_final+0x63/0x90 [libcfs]
May 16 04:50:36 oss29 kernel: ? kfree+0xd0/0x250
May 16 04:50:36 oss29 kernel: ofd_commitrw+0x5b4/0xd20 [ofd]
May 16 04:50:36 oss29 kernel: ? obd_commitrw+0x1b0/0x380 [ptlrpc]
May 16 04:50:36 oss29 kernel: obd_commitrw+0x1b0/0x380 [ptlrpc]
May 16 04:50:36 oss29 kernel: tgt_brw_write+0x139f/0x1ce0 [ptlrpc]
May 16 04:50:36 oss29 kernel: ? flush_work+0x42/0x1d0
May 16 04:50:36 oss29 kernel: ? internal_add_timer+0x42/0x70
May 16 04:50:36 oss29 kernel: ? _cond_resched+0x15/0x30
May 16 04:50:36 oss29 kernel: ? mutex_lock+0xe/0x30
May 16 04:50:36 oss29 kernel: tgt_request_handle+0xc97/0x1a40 [ptlrpc]
May 16 04:50:36 oss29 kernel: ? ptlrpc_nrs_req_get_nolock0+0xff/0x1f0 
[ptlrpc]
May 16 04:50:36 oss29 kernel: ptlrpc_server_handle_request+0x323/0xbe0 
[ptlrpc]
May 16 04:50:36 oss29 kernel: ? finish_wait+0x80/0x80
May 16 04:50:36 oss29 kernel: ptlrpc_main+0xc0f/0x1570 [ptlrpc]
May 16 04:50:36 oss29 kernel: ? ptlrpc_wait_event+0x590/0x590 [ptlrpc]
May 16 04:50:36 oss29 kernel: kthread+0x10a/0x120
May 16 04:50:36 oss29 kernel: ? set_kthread_struct+0x50/0x50
May 16 04:50:36 oss29 kernel: ret_from_fork+0x1f/0x40
May 16 04:50:36 oss29 kernel: INFO: task ll_ost_io01_001:333511 blocked 
for more than 120 seconds.
May 16 04:50:36 oss29 kernel:      Tainted: G        W IOE    --------- 
-  - 4.18.0-425.3.1.el8_lustre.x86_64 #1
May 16 04:50:36 oss29 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 16 04:50:36 oss29 kernel: task:ll_ost_io01_001 state:D stack:    0 
pid:333511 ppid:     2 flags:0x80004080
May 16 04:50:36 oss29 kernel: Call Trace:
May 16 04:50:36 oss29 kernel: __schedule+0x2d1/0x860
May 16 04:50:36 oss29 kernel: schedule+0x35/0xa0
May 16 04:50:36 oss29 kernel: wait_transaction_locked+0x89/0xd0 [jbd2]
May 16 04:50:36 oss29 kernel: ? finish_wait+0x80/0x80
May 16 04:50:36 oss29 kernel: add_transaction_credits+0xd4/0x290 [jbd2]
May 16 04:50:36 oss29 kernel: ? ktime_get+0x3e/0xa0
May 16 04:50:36 oss29 kernel: start_this_handle+0x10a/0x520 [jbd2]
May 16 04:50:36 oss29 kernel: ? jbd2__journal_start+0x8f/0x1f0 [jbd2]
May 16 04:50:36 oss29 kernel: ? kmem_cache_alloc+0x13f/0x280
May 16 04:50:36 oss29 kernel: jbd2__journal_start+0xee/0x1f0 [jbd2]
May 16 04:50:36 oss29 kernel: ? osd_trans_start+0x13b/0x500 
[osd_ldiskfs]
May 16 04:50:36 oss29 kernel: __ldiskfs_journal_start_sb+0x6e/0x140 
[ldiskfs]
May 16 04:50:36 oss29 kernel: osd_trans_start+0x13b/0x500 [osd_ldiskfs]
May 16 04:50:36 oss29 kernel: ofd_write_attr_set+0x11d/0x1070 [ofd]
May 16 04:50:36 oss29 kernel: ofd_commitrw_write+0x226/0x1ad0 [ofd]
May 16 04:50:36 oss29 kernel: ? cfs_crypto_hash_final+0x63/0x90 [libcfs]
May 16 04:50:36 oss29 kernel: ? kfree+0xd3/0x250
May 16 04:50:36 oss29 kernel: ? free_unref_page_commit+0x91/0x110
May 16 04:50:36 oss29 kernel: ofd_commitrw+0x5b4/0xd20 [ofd]
May 16 04:50:36 oss29 kernel: ? obd_commitrw+0x1b0/0x380 [ptlrpc]
May 16 04:50:36 oss29 kernel: obd_commitrw+0x1b0/0x380 [ptlrpc]
May 16 04:50:36 oss29 kernel: tgt_brw_write+0x139f/0x1ce0 [ptlrpc]
May 16 04:50:36 oss29 kernel: ? enqueue_task_fair+0x96/0x5c0
May 16 04:50:36 oss29 kernel: ? flush_work+0x42/0x1d0
May 16 04:50:36 oss29 kernel: ? internal_add_timer+0x42/0x70
May 16 04:50:36 oss29 kernel: ? _cond_resched+0x15/0x30
May 16 04:50:36 oss29 kernel: ? mutex_lock+0xe/0x30
May 16 04:50:36 oss29 kernel: tgt_request_handle+0xc97/0x1a40 [ptlrpc]
May 16 04:50:36 oss29 kernel: ? ptlrpc_nrs_req_get_nolock0+0xff/0x1f0 
[ptlrpc]
May 16 04:50:36 oss29 kernel: ptlrpc_server_handle_request+0x323/0xbe0 
[ptlrpc]
May 16 04:50:36 oss29 kernel: ptlrpc_main+0xc0f/0x1570 [ptlrpc]
May 16 04:50:36 oss29 kernel: ? ptlrpc_wait_event+0x590/0x590 [ptlrpc]
May 16 04:50:36 oss29 kernel: kthread+0x10a/0x120
May 16 04:50:36 oss29 kernel: ? set_kthread_struct+0x50/0x50
May 16 04:50:36 oss29 kernel: ret_from_fork+0x1f/0x40
May 16 04:50:36 oss29 kernel: INFO: task ll_ost_io01_002:333512 blocked 
for more than 120 seconds.
May 16 04:50:36 oss29 kernel:      Tainted: G        W IOE    --------- 
-  - 4.18.0-425.3.1.el8_lustre.x86_64 #1
May 16 04:50:36 oss29 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 16 04:50:36 oss29 kernel: task:ll_ost_io01_002 state:D stack:    0 
pid:333512 ppid:     2 flags:0x80004080
May 16 04:50:36 oss29 kernel: Call Trace:
May 16 04:50:36 oss29 kernel: __schedule+0x2d1/0x860
May 16 04:50:36 oss29 kernel: ? finish_wait+0x80/0x80
May 16 04:50:36 oss29 kernel: schedule+0x35/0xa0
May 16 04:50:36 oss29 kernel: md_bitmap_startwrite+0x173/0x1e0
May 16 04:50:36 oss29 kernel: ? finish_wait+0x80/0x80
May 16 04:50:36 oss29 kernel: add_stripe_bio+0x4b8/0x7e0 [raid456]
May 16 04:50:36 oss29 kernel: raid5_make_request+0x218/0xc50 [raid456]
May 16 04:50:36 oss29 kernel: ? generic_make_request_checks+0x2a5/0x4f0
May 16 04:50:36 oss29 kernel: ? __bio_clone+0x60/0xe0
May 16 04:50:36 oss29 kernel: ? finish_wait+0x80/0x80
May 16 04:50:36 oss29 kernel: ? blk_queue_split+0xd4/0x670
May 16 04:50:36 oss29 kernel: ? finish_wait+0x80/0x80
May 16 04:50:36 oss29 kernel: md_handle_request+0x128/0x1b0
May 16 04:50:36 oss29 kernel: md_make_request+0x5b/0xb0
May 16 04:50:36 oss29 kernel: generic_make_request_no_check+0x202/0x330
May 16 04:50:36 oss29 kernel: submit_bio+0x3c/0x160
May 16 04:50:36 oss29 kernel: osd_do_bio.constprop.51+0xb63/0xc40 
[osd_ldiskfs]
May 16 04:50:36 oss29 kernel: ? __ldiskfs_journal_stop+0x36/0xb0 
[ldiskfs]
May 16 04:50:36 oss29 kernel: ? osd_ldiskfs_map_inode_pages+0x873/0x8f0 
[osd_ldiskfs]
May 16 04:50:36 oss29 kernel: osd_ldiskfs_map_inode_pages+0x873/0x8f0 
[osd_ldiskfs]
May 16 04:50:36 oss29 kernel: osd_write_commit+0x5e2/0x990 [osd_ldiskfs]
May 16 04:50:36 oss29 kernel: ofd_commitrw_write+0x77e/0x1ad0 [ofd]
May 16 04:50:36 oss29 kernel: ? cfs_crypto_hash_final+0x63/0x90 [libcfs]
May 16 04:50:36 oss29 kernel: ? kfree+0xd0/0x250
May 16 04:50:36 oss29 kernel: ofd_commitrw+0x5b4/0xd20 [ofd]
May 16 04:50:36 oss29 kernel: ? obd_commitrw+0x1b0/0x380 [ptlrpc]
May 16 04:50:36 oss29 kernel: obd_commitrw+0x1b0/0x380 [ptlrpc]
May 16 04:50:36 oss29 kernel: tgt_brw_write+0x139f/0x1ce0 [ptlrpc]
May 16 04:50:36 oss29 kernel: ? enqueue_task_fair+0x96/0x5c0
May 16 04:50:36 oss29 kernel: ? flush_work+0x42/0x1d0
May 16 04:50:36 oss29 kernel: ? internal_add_timer+0x42/0x70
May 16 04:50:36 oss29 kernel: ? _cond_resched+0x15/0x30
May 16 04:50:36 oss29 kernel: ? mutex_lock+0xe/0x30
May 16 04:50:36 oss29 kernel: tgt_request_handle+0xc97/0x1a40 [ptlrpc]
May 16 04:50:36 oss29 kernel: ? ptlrpc_nrs_req_get_nolock0+0xff/0x1f0 
[ptlrpc]
May 16 04:50:36 oss29 kernel: ptlrpc_server_handle_request+0x323/0xbe0 
[ptlrpc]
May 16 04:50:36 oss29 kernel: ptlrpc_main+0xc0f/0x1570 [ptlrpc]
May 16 04:50:36 oss29 kernel: ? ptlrpc_wait_event+0x590/0x590 [ptlrpc]
May 16 04:50:36 oss29 kernel: kthread+0x10a/0x120
May 16 04:50:36 oss29 kernel: ? set_kthread_struct+0x50/0x50
May 16 04:50:36 oss29 kernel: ret_from_fork+0x1f/0x40
…

Any idea what have caused the issue?

Thanks,
Jane





More information about the lustre-discuss mailing list