[lustre-discuss] LustreError on MDS/MGS
Ihsan Ur Rahman
ihsanurr67 at gmail.com
Tue Oct 14 04:34:02 PDT 2025
Hello lustre folks,
All of the sudden we have started facing the below errors on MDS/MGS. mds
and mgs are on the same host. Lustre version lustre version 2.12.6. the
base OS is centos 7.
49.180502] LustreError: 137-5: lustre-MDT0001_UUID: not available for
connect from 0 at lo (no target). If you are running an HA pair check that the
target is mounted on the other server.
[ 51.322591] LustreError: 11-0: lustre-MDT0000-osp-MDT0001: operation
mds_connect to node 0 at lo failed: rc = -114
[ 51.324763] LustreError: 137-5: lustre-MDT0002_UUID: not available for
connect from 0 at lo (no target). If you are running an HA pair check that the
target is mounted on the other server.
[ 51.324767] LustreError: Skipped 1 previous similar message
[ 76.461205] LustreError: 137-5: lustre-MDT0002_UUID: not available for
connect from 0 at lo (no target). If you are running an HA pair check that the
target is mounted on the other server.
[ 85.400171] LustreError: 11-0: lustre-MDT0000-osp-MDT0002: operation
mds_connect to node 0 at lo failed: rc = -114
[ 112.007560] LustreError:
3670:0:(lod_dev.c:434:lod_sub_recovery_thread()) lustre-MDT0002-osp-MDT0000
get update log failed: rc = -22
[ 169.235398] LustreError:
3671:0:(tgt_grant.c:248:tgt_grant_sanity_check()) mdt_obd_disconnect:
tot_granted 35651584 != fo_tot_granted 50331648
[ 182.912888] LustreError:
3861:0:(ldlm_lockd.c:2366:ldlm_cancel_handler()) ldlm_cancel from
10.19.4.59 at o2ib arrived at 1760429996 with bad export cookie
4008858034446994893
[ 183.618108] LustreError:
3861:0:(ldlm_lockd.c:2366:ldlm_cancel_handler()) ldlm_cancel from
10.19.4.44 at o2ib arrived at 1760429997 with bad export cookie
4008858034446994879
[ 183.618235] LustreError:
3861:0:(ldlm_lockd.c:2366:ldlm_cancel_handler()) Skipped 1 previous similar
message
[ 184.268980] LustreError: 11-0: lustre-MDT0000-osp-MDT0001: operation
mds_statfs to node 0 at lo failed: rc = -107
[ 184.269023] LustreError: Skipped 1 previous similar message
[ 185.924619] LustreError:
3861:0:(ldlm_lockd.c:2366:ldlm_cancel_handler()) ldlm_cancel from
10.19.4.54 at o2ib arrived at 1760429999 with bad export cookie
4008858034446994872
[ 185.924752] LustreError:
3861:0:(ldlm_lockd.c:2366:ldlm_cancel_handler()) Skipped 1 previous similar
message
[ 185.925184] LustreError: 137-5: lustre-MDT0000_UUID: not available for
connect from 10.19.4.54 at o2ib (no target). If you are running an HA pair
check that the target is mounted on the other server.
[ 185.925252] LustreError: Skipped 1 previous similar message
[ 189.215567] LustreError:
4038:0:(ldlm_lockd.c:2366:ldlm_cancel_handler()) ldlm_cancel from
10.19.4.46 at o2ib arrived at 1760430002 with bad export cookie
4008858034446994914
[ 189.216836] LustreError:
4038:0:(ldlm_lockd.c:2366:ldlm_cancel_handler()) Skipped 5 previous similar
messages
[ 190.089402] LustreError: 137-5: lustre-MDT0000_UUID: not available for
connect from 10.19.4.141 at o2ib (no target). If you are running an HA pair
check that the target is mounted on the other server.
[ 190.089480] LustreError: Skipped 13 previous similar messages
[ 198.160152] LustreError: 137-5: lustre-MDT0000_UUID: not available for
connect from 10.19.4.50 at o2ib (no target). If you are running an HA pair
check that the target is mounted on the other server.
[ 198.160222] LustreError: Skipped 14 previous similar messages
[ 206.524931] LustreError: 11-0: lustre-MDT0001-osp-MDT0002: operation
mds_statfs to node 0 at lo failed: rc = -107
[ 219.633696] LustreError: 137-5: lustre-MDT0000_UUID: not available for
connect from 10.19.4.139 at o2ib (no target). If you are running an HA pair
check that the target is mounted on the other server.
[ 219.633781] LustreError: Skipped 49 previous similar messages
[ 1640.746488] LustreError: 137-5: lustre-MDT0001_UUID: not available for
connect from 0 at lo (no target). If you are running an HA pair check that the
target is mounted on the other server.
[ 1640.747278] LustreError: Skipped 6 previous similar messages
[ 1642.718528] LustreError: 11-0: lustre-MDT0000-osp-MDT0001: operation
mds_connect to node 0 at lo failed: rc = -114
[ 1644.744881] LustreError: 11-0: lustre-MDT0000-osp-MDT0002: operation
mds_connect to node 0 at lo failed: rc = -114
[ 1671.459736] LustreError:
4689:0:(lod_dev.c:434:lod_sub_recovery_thread()) lustre-MDT0002-osp-MDT0000
get update log failed: rc = -22
[ 6383.485229] LustreError:
3375:0:(client.c:1187:ptlrpc_import_delay_req()) @@@ IMP_CLOSED
req at ffff9abf94b6ad00 x1845944478633792/t0(0)
o41->lustre-MDT0001-osp-MDT0000 at 0@lo:24/4 lens 224/368 e 0 to 0 dl 0 ref 1
fl Rpc:/0/ffffffff rc 0/-1
[ 6384.125240] LustreError:
3357:0:(client.c:1187:ptlrpc_import_delay_req()) @@@ IMP_CLOSED
req at ffff9add02797080 x1845944478635840/t0(0)
o41->lustre-MDT0002-osp-MDT0000 at 0@lo:24/4 lens 224/368 e 0 to 0 dl 0 ref 1
fl Rpc:/0/ffffffff rc 0/-1
[ 6385.101316] LustreError: 11-0: lustre-MDT0000-osp-MDT0001: operation
mds_statfs to node 0 at lo failed: rc = -107
[ 6385.101989] LustreError: Skipped 2 previous similar messages
[ 6391.858013] LustreError: 137-5: lustre-MDT0000_UUID: not available for
connect from 10.19.4.100 at o2ib (no target). If you are running an HA pair
check that the target is mounted on the other server.
[ 6391.859509] LustreError: Skipped 5 previous similar messages
[ 6392.864490] LustreError: 137-5: lustre-MDT0000_UUID: not available for
connect from 10.19.4.50 at o2ib (no target). If you are running an HA pair
check that the target is mounted on the other server.
[ 6394.329714] LustreError:
5033:0:(ldlm_lockd.c:2366:ldlm_cancel_handler()) ldlm_cancel from
10.19.4.98 at o2ib arrived at 1760436207 with bad export cookie
4008858034447626986
[ 6394.331428] LustreError:
5033:0:(ldlm_lockd.c:2366:ldlm_cancel_handler()) Skipped 6 previous similar
messages
[ 6394.332814] LustreError: 137-5: lustre-MDT0000_UUID: not available for
connect from 10.19.4.98 at o2ib (no target). If you are running an HA pair
check that the target is mounted on the other server.
[ 6397.480407] LustreError: 137-5: lustre-MDT0000_UUID: not available for
connect from 10.19.4.141 at o2ib (no target). If you are running an HA pair
check that the target is mounted on the other server.
[ 6397.482411] LustreError: Skipped 2 previous similar messages
[ 6402.032037] LustreError: 137-5: lustre-MDT0000_UUID: not available for
connect from 10.19.4.49 at o2ib (no target). If you are running an HA pair
check that the target is mounted on the other server.
[ 6402.034201] LustreError: Skipped 11 previous similar messages
[ 6410.570814] LustreError: 137-5: lustre-MDT0000_UUID: not available for
connect from 10.19.4.133 at o2ib (no target). If you are running an HA pair
check that the target is mounted on the other server.
[ 6410.573110] LustreError: Skipped 16 previous similar messages
[ 6418.437131] LustreError: 166-1: MGC10.19.4.132 at o2ib: Connection to MGS
(at 0 at lo) was lost; in progress operations using this service will fail
[ 6427.991503] LustreError: 137-5: lustre-MDT0000_UUID: not available for
connect from 10.19.4.59 at o2ib (no target). If you are running an HA pair
check that the target is mounted on the other server.
[ 6427.994071] LustreError: Skipped 24 previous similar messages
[ 6448.585655] LustreError: 11-0: lustre-MDT0001-osp-MDT0002: operation
mds_disconnect to node 0 at lo failed: rc = -107
[ 6448.589164] LustreError:
3381:0:(client.c:1187:ptlrpc_import_delay_req()) @@@ IMP_CLOSED
req at ffff9abfdf9fb180 x1845944478691712/t0(0)
o41->lustre-MDT0001-osp-MDT0002 at 0@lo:24/4 lens 224/368 e 0 to 0 dl 0 ref 1
fl Rpc:/0/ffffffff rc 0/-1
[ 8811.701653] LustreError: 137-5: lustre-MDT0001_UUID: not available for
connect from 0 at lo (no target). If you are running an HA pair check that the
target is mounted on the other server.
[ 8811.704484] LustreError: Skipped 92 previous similar messages
[ 8813.672899] LustreError: 11-0: lustre-MDT0000-osp-MDT0001: operation
mds_connect to node 0 at lo failed: rc = -114
[ 8815.749249] LustreError: 11-0: lustre-MDT0000-osp-MDT0002: operation
mds_connect to node 0 at lo failed: rc = -114
[ 8842.422919] LustreError:
5914:0:(lod_dev.c:434:lod_sub_recovery_thread()) lustre-MDT0002-osp-MDT0000
get update log failed: rc = -22
tried with unmount and mount the mgt and then mdt. when the user start
running the jobs, we also encounter CPU lock errors on the mgs/mds node.
error file attached.
mount | grep -i lustre
/dev/sdb on /mnt/mgsmdt0 type lustre
(ro,context=unconfined_u:object_r:user_tmp_t:s0,svname=lustre-MDT0000,mgs,osd=osd-ldiskfs,user_xattr,errors=remount-ro)
/dev/sdc on /mnt/mdt1 type lustre
(ro,context=unconfined_u:object_r:user_tmp_t:s0,svname=lustre-MDT0001,mgsnode=10.19.4.132 at o2ib
,osd=osd-ldiskfs,user_xattr,errors=remount-ro)
/dev/sdd on /mnt/mdt2 type lustre
(ro,context=unconfined_u:object_r:user_tmp_t:s0,svname=lustre-MDT0002,mgsnode=10.19.4.132 at o2ib
,osd=osd-ldiskfs,user_xattr,errors=remount-ro)
regards,
Ihsan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20251014/858b6dbe/attachment-0001.htm>
-------------- next part --------------
[68012.986164] NMI watchdog: BUG: soft lockup - CPU#57 stuck for 23s! [mdt01_076:4642]
[68012.986542] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) ko2iblnd(OE) lnet(OE) libcfs(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ses enclosure scsi_transport_sas sg joydev i2c_i801 lpc_ich mei_me mei wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad knem(OE) ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ast i2c_algo_bit
[68012.986584] drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx5_core(OE) crct10dif_pclmul crct10dif_common crc32c_intel ahci mlxfw(OE) vfio_mdev(OE) vfio_iommu_type1 i40e drm vfio libahci mdev(OE) devlink megaraid_sas mlx_compat(OE) libata ptp pps_core drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod
[68012.986603] CPU: 57 PID: 4642 Comm: mdt01_076 Kdump: loaded Tainted: G OEL ------------ 3.10.0-1160.118.1.el7_lustre.x86_64 #1
[68012.986604] Hardware name: Enginetech EG540MS-G20/EG11DPH-T, BIOS 3.3 02/24/2020
[68012.986606] task: ffff96d28ea6b180 ti: ffff96d28eac4000 task.ti: ffff96d28eac4000
[68012.986608] RIP: 0010:[<ffffffff8c11ec62>] [<ffffffff8c11ec62>] native_queued_spin_lock_slowpath+0x122/0x200
[68012.986612] RSP: 0018:ffff96d28eac7d88 EFLAGS: 00000246
[68012.986614] RAX: 0000000000000000 RBX: 000000000000050d RCX: 0000000001c90000
[68012.986616] RDX: ffff96da1d2db8c0 RSI: 0000000000b90001 RDI: ffff96d9f0f38030
[68012.986618] RBP: ffff96d28eac7d88 R08: ffff96da1d85b8c0 R09: 0000000000000000
[68012.986620] R10: 00000000d8a3a201 R11: ffff96c4d8a3a600 R12: ffffffff8c0b3c75
[68012.986622] R13: ffff96d28eac7d38 R14: ffff96d28eac7d10 R15: ffff96d28eac7d28
[68012.986624] FS: 0000000000000000(0000) GS:ffff96da1d840000(0000) knlGS:0000000000000000
[68012.986626] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[68012.986628] CR2: 00007fced3e3cd70 CR3: 0000000273c10000 CR4: 00000000007607e0
[68012.986630] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[68012.986632] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[68012.986634] PKRU: 00000000
[68012.986635] Call Trace:
[68012.986639] [<ffffffff8c7ac21a>] queued_spin_lock_slowpath+0xb/0x13
[68012.986641] [<ffffffff8c7ba734>] _raw_spin_lock+0x24/0x30
[68012.986701] [<ffffffffc1185ba2>] ptlrpc_server_handle_req_in+0x42/0xd60 [ptlrpc]
[68012.986760] [<ffffffffc118a435>] ptlrpc_main+0xac5/0x1480 [ptlrpc]
[68012.986819] [<ffffffffc1189970>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
[68012.986822] [<ffffffff8c0cb621>] kthread+0xd1/0xe0
[68012.986826] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68012.986829] [<ffffffff8c7c51dd>] ret_from_fork_nospec_begin+0x7/0x21
[68012.986832] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68012.986834] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 c0 b8 01 00 48 03 14 c5 c0 19 d5 8c 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b
[68012.990164] NMI watchdog: BUG: soft lockup - CPU#58 stuck for 23s! [kiblnd_sd_01_01:3157]
[68012.990549] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) ko2iblnd(OE) lnet(OE) libcfs(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ses enclosure scsi_transport_sas sg joydev i2c_i801 lpc_ich mei_me mei wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad knem(OE) ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ast i2c_algo_bit
[68012.990591] drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx5_core(OE) crct10dif_pclmul crct10dif_common crc32c_intel ahci mlxfw(OE) vfio_mdev(OE) vfio_iommu_type1 i40e drm vfio libahci mdev(OE) devlink megaraid_sas mlx_compat(OE) libata ptp pps_core drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod
[68012.990611] CPU: 58 PID: 3157 Comm: kiblnd_sd_01_01 Kdump: loaded Tainted: G OEL ------------ 3.10.0-1160.118.1.el7_lustre.x86_64 #1
[68012.990612] Hardware name: Enginetech EG540MS-G20/EG11DPH-T, BIOS 3.3 02/24/2020
[68012.990614] task: ffff96d9f8c0a100 ti: ffff96da19254000 task.ti: ffff96da19254000
[68012.990616] RIP: 0010:[<ffffffff8c11ec62>] [<ffffffff8c11ec62>] native_queued_spin_lock_slowpath+0x122/0x200
[68012.990620] RSP: 0018:ffff96da19257a90 EFLAGS: 00000246
[68012.990622] RAX: 0000000000000000 RBX: ffff96da19257a08 RCX: 0000000001d10000
[68012.990624] RDX: ffff96da1d7db8c0 RSI: 0000000001b90000 RDI: ffff96d9efa59000
[68012.990626] RBP: ffff96da19257a90 R08: ffff96da1d89b8c0 R09: 0000000000000000
[68012.990628] R10: 00000000000002b0 R11: ffff96cc6c7a4d38 R12: ffffffff8c0d8990
[68012.990629] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000300000101
[68012.990632] FS: 0000000000000000(0000) GS:ffff96da1d880000(0000) knlGS:0000000000000000
[68012.990634] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[68012.990636] CR2: 00007fdff4294210 CR3: 0000000273c10000 CR4: 00000000007607e0
[68012.990638] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[68012.990640] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[68012.990641] PKRU: 00000000
[68012.990642] Call Trace:
[68012.990646] [<ffffffff8c7ac21a>] queued_spin_lock_slowpath+0xb/0x13
[68012.990649] [<ffffffff8c7ba734>] _raw_spin_lock+0x24/0x30
[68012.990662] [<ffffffffc0898cf8>] cfs_percpt_lock+0x58/0x110 [libcfs]
[68012.990679] [<ffffffffc0e19b33>] lnet_finalize+0x1e3/0xd60 [lnet]
[68012.990695] [<ffffffffc0e2187a>] ? lnet_copy_iov2iov+0xfa/0x280 [lnet]
[68012.990705] [<ffffffffc09ed93d>] kiblnd_recv+0x1cd/0x7d0 [ko2iblnd]
[68012.990721] [<ffffffffc0e1e65c>] ? lnet_mt_match_md+0x8c/0x1c0 [lnet]
[68012.990738] [<ffffffffc0e25968>] lnet_ni_recv+0xc8/0x330 [lnet]
[68012.990754] [<ffffffffc0e260e5>] lnet_recv_put+0x85/0xc0 [lnet]
[68012.990770] [<ffffffffc0e2c7f6>] lnet_parse_local+0x5b6/0xd50 [lnet]
[68012.990786] [<ffffffffc0e2d92a>] lnet_parse+0x99a/0x11f0 [lnet]
[68012.990789] [<ffffffff8c0d8e02>] ? __wake_up_common+0x82/0x120
[68012.990799] [<ffffffffc09ee2f3>] kiblnd_handle_rx+0x213/0x6b0 [ko2iblnd]
[68012.990808] [<ffffffffc09f54c2>] kiblnd_scheduler+0xfb2/0x11d0 [ko2iblnd]
[68012.990812] [<ffffffff8c0e1170>] ? wake_up_state+0x20/0x20
[68012.990821] [<ffffffffc09f4510>] ? kiblnd_cq_event+0x90/0x90 [ko2iblnd]
[68012.990824] [<ffffffff8c0cb621>] kthread+0xd1/0xe0
[68012.990828] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68012.990831] [<ffffffff8c7c51dd>] ret_from_fork_nospec_begin+0x7/0x21
[68012.990834] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68012.990836] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 c0 b8 01 00 48 03 14 c5 c0 19 d5 8c 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b
[68012.993164] NMI watchdog: BUG: soft lockup - CPU#59 stuck for 23s! [mdt01_113:14297]
[68012.993554] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) ko2iblnd(OE) lnet(OE) libcfs(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ses enclosure scsi_transport_sas sg joydev i2c_i801 lpc_ich mei_me mei wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad knem(OE) ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ast i2c_algo_bit
[68012.993598] drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx5_core(OE) crct10dif_pclmul crct10dif_common crc32c_intel ahci mlxfw(OE) vfio_mdev(OE) vfio_iommu_type1 i40e drm vfio libahci mdev(OE) devlink megaraid_sas mlx_compat(OE) libata ptp pps_core drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod
[68012.993617] CPU: 59 PID: 14297 Comm: mdt01_113 Kdump: loaded Tainted: G OEL ------------ 3.10.0-1160.118.1.el7_lustre.x86_64 #1
[68012.993619] Hardware name: Enginetech EG540MS-G20/EG11DPH-T, BIOS 3.3 02/24/2020
[68012.993621] task: ffff96bfef6f4200 ti: ffff96c5d5d6c000 task.ti: ffff96c5d5d6c000
[68012.993623] RIP: 0010:[<ffffffff8c11ec62>] [<ffffffff8c11ec62>] native_queued_spin_lock_slowpath+0x122/0x200
[68012.993627] RSP: 0018:ffff96c5d5d6fd88 EFLAGS: 00000246
[68012.993628] RAX: 0000000000000000 RBX: 000000000000050d RCX: 0000000001d90000
[68012.993630] RDX: ffff96da1d41b8c0 RSI: 0000000000c10001 RDI: ffff96d9f0f38030
[68012.993632] RBP: ffff96c5d5d6fd88 R08: ffff96da1d8db8c0 R09: 0000000000000000
[68012.993634] R10: 0000000000000001 R11: 7fffffffffffffff R12: ffffffff8c0b3c75
[68012.993636] R13: ffff96c5d5d6fd38 R14: ffff96c5d5d6fd10 R15: 0000000000000000
[68012.993639] FS: 0000000000000000(0000) GS:ffff96da1d8c0000(0000) knlGS:0000000000000000
[68012.993641] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[68012.993642] CR2: 00007f7e427fb4b0 CR3: 0000000273c10000 CR4: 00000000007607e0
[68012.993645] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[68012.993646] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[68012.993648] PKRU: 00000000
[68012.993649] Call Trace:
[68012.993653] [<ffffffff8c7ac21a>] queued_spin_lock_slowpath+0xb/0x13
[68012.993656] [<ffffffff8c7ba734>] _raw_spin_lock+0x24/0x30
[68012.993716] [<ffffffffc1185ba2>] ptlrpc_server_handle_req_in+0x42/0xd60 [ptlrpc]
[68012.993776] [<ffffffffc118a435>] ptlrpc_main+0xac5/0x1480 [ptlrpc]
[68012.993836] [<ffffffffc1189970>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
[68012.993839] [<ffffffff8c0cb621>] kthread+0xd1/0xe0
[68012.993843] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68012.993846] [<ffffffff8c7c51dd>] ret_from_fork_nospec_begin+0x7/0x21
[68012.993849] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68012.993851] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 c0 b8 01 00 48 03 14 c5 c0 19 d5 8c 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b
[68012.996164] NMI watchdog: BUG: soft lockup - CPU#60 stuck for 23s! [mdt01_013:3922]
[68012.996554] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) ko2iblnd(OE) lnet(OE) libcfs(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ses enclosure scsi_transport_sas sg joydev i2c_i801 lpc_ich mei_me mei wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad knem(OE) ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ast i2c_algo_bit
[68012.996597] drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx5_core(OE) crct10dif_pclmul crct10dif_common crc32c_intel ahci mlxfw(OE) vfio_mdev(OE) vfio_iommu_type1 i40e drm vfio libahci mdev(OE) devlink megaraid_sas mlx_compat(OE) libata ptp pps_core drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod
[68012.996616] CPU: 60 PID: 3922 Comm: mdt01_013 Kdump: loaded Tainted: G OEL ------------ 3.10.0-1160.118.1.el7_lustre.x86_64 #1
[68012.996618] Hardware name: Enginetech EG540MS-G20/EG11DPH-T, BIOS 3.3 02/24/2020
[68012.996620] task: ffff96da18092100 ti: ffff96d9b663c000 task.ti: ffff96d9b663c000
[68012.996621] RIP: 0010:[<ffffffff8c11ec62>] [<ffffffff8c11ec62>] native_queued_spin_lock_slowpath+0x122/0x200
[68012.996625] RSP: 0018:ffff96d9b663fcb8 EFLAGS: 00000246
[68012.996627] RAX: 0000000000000000 RBX: ffff96d58c13e000 RCX: 0000000001e10000
[68012.996629] RDX: ffff96da1d31b8c0 RSI: 0000000000a10001 RDI: ffff96d9f0f38030
[68012.996631] RBP: ffff96d9b663fcb8 R08: ffff96da1d91b8c0 R09: 0000000000000000
[68012.996633] R10: 000000003b540001 R11: ffff96ca3b547080 R12: ffff96c89b7dc800
[68012.996635] R13: 0000000180400035 R14: ffff96abbfc04b00 R15: ffff96d9ef9c0001
[68012.996637] FS: 0000000000000000(0000) GS:ffff96da1d900000(0000) knlGS:0000000000000000
[68012.996639] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[68012.996641] CR2: 0000562108484000 CR3: 0000000273c10000 CR4: 00000000007607e0
[68012.996643] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[68012.996645] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[68012.996646] PKRU: 00000000
[68012.996648] Call Trace:
[68012.996651] [<ffffffff8c7ac21a>] queued_spin_lock_slowpath+0xb/0x13
[68012.996654] [<ffffffff8c7ba734>] _raw_spin_lock+0x24/0x30
[68012.996713] [<ffffffffc118447a>] ptlrpc_server_drop_request+0x1ca/0x6f0 [ptlrpc]
[68012.996772] [<ffffffffc1184a32>] ptlrpc_server_finish_active_request+0x92/0x140 [ptlrpc]
[68012.996831] [<ffffffffc1186cc1>] ptlrpc_server_handle_request+0x401/0xab0 [ptlrpc]
[68012.996889] [<ffffffffc11838c5>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]
[68012.996892] [<ffffffff8c0d8f73>] ? __wake_up+0x13/0x20
[68012.996950] [<ffffffffc118a4b4>] ptlrpc_main+0xb44/0x1480 [ptlrpc]
[68012.997009] [<ffffffffc1189970>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
[68012.997012] [<ffffffff8c0cb621>] kthread+0xd1/0xe0
[68012.997016] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68012.997019] [<ffffffff8c7c51dd>] ret_from_fork_nospec_begin+0x7/0x21
[68012.997022] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68012.997024] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 c0 b8 01 00 48 03 14 c5 c0 19 d5 8c 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b
[68013.000164] NMI watchdog: BUG: soft lockup - CPU#61 stuck for 23s! [mdt01_043:4077]
[68013.000556] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) ko2iblnd(OE) lnet(OE) libcfs(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ses enclosure scsi_transport_sas sg joydev i2c_i801 lpc_ich mei_me mei wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad knem(OE) ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ast i2c_algo_bit
[68013.000598] drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx5_core(OE) crct10dif_pclmul crct10dif_common crc32c_intel ahci mlxfw(OE) vfio_mdev(OE) vfio_iommu_type1 i40e drm vfio libahci mdev(OE) devlink megaraid_sas mlx_compat(OE) libata ptp pps_core drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod
[68013.000618] CPU: 61 PID: 4077 Comm: mdt01_043 Kdump: loaded Tainted: G OEL ------------ 3.10.0-1160.118.1.el7_lustre.x86_64 #1
[68013.000620] Hardware name: Enginetech EG540MS-G20/EG11DPH-T, BIOS 3.3 02/24/2020
[68013.000622] task: ffff96d9eeae4200 ti: ffff96d8d1a08000 task.ti: ffff96d8d1a08000
[68013.000623] RIP: 0010:[<ffffffff8c11ec98>] [<ffffffff8c11ec98>] native_queued_spin_lock_slowpath+0x158/0x200
[68013.000628] RSP: 0018:ffff96d8d1a0bcb8 EFLAGS: 00000202
[68013.000630] RAX: 0000000000000001 RBX: ffff96ccf220b000 RCX: 0000000001e90000
[68013.000632] RDX: 0000000000b10001 RSI: 0000000000d10001 RDI: ffff96d9f0f38030
[68013.000634] RBP: ffff96d8d1a0bcb8 R08: ffff96da1d95b8c0 R09: ffff96da1d51b8c0
[68013.000635] R10: 00000000d3afda01 R11: ffff96d5d3af8d80 R12: 0000000000000000
[68013.000637] R13: ffff96d4d8f49000 R14: ffff96c597e74388 R15: ffff96d4d8f40001
[68013.000640] FS: 0000000000000000(0000) GS:ffff96da1d940000(0000) knlGS:0000000000000000
[68013.000642] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[68013.000644] CR2: 00007f042c2901cc CR3: 0000000273c10000 CR4: 00000000007607e0
[68013.000646] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[68013.000648] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[68013.000649] PKRU: 00000000
[68013.000650] Call Trace:
[68013.000654] [<ffffffff8c7ac21a>] queued_spin_lock_slowpath+0xb/0x13
[68013.000657] [<ffffffff8c7ba734>] _raw_spin_lock+0x24/0x30
[68013.000718] [<ffffffffc118447a>] ptlrpc_server_drop_request+0x1ca/0x6f0 [ptlrpc]
[68013.000778] [<ffffffffc1184a32>] ptlrpc_server_finish_active_request+0x92/0x140 [ptlrpc]
[68013.000837] [<ffffffffc1186cc1>] ptlrpc_server_handle_request+0x401/0xab0 [ptlrpc]
[68013.000895] [<ffffffffc11838c5>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]
[68013.000898] [<ffffffff8c0d8f73>] ? __wake_up+0x13/0x20
[68013.000957] [<ffffffffc118a4b4>] ptlrpc_main+0xb44/0x1480 [ptlrpc]
[68013.001015] [<ffffffffc1189970>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
[68013.001019] [<ffffffff8c0cb621>] kthread+0xd1/0xe0
[68013.001022] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68013.001025] [<ffffffff8c7c51dd>] ret_from_fork_nospec_begin+0x7/0x21
[68013.001029] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68013.001031] Code: 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 85 c0 74 21 83 f8 03 75 10 eb 1a 66 2e 0f 1f 84 00 00 00 00 00 85 c0 74 0c f3 90 8b 17 <0f> b7 c2 83 f8 03 75 f0 be 01 00 00 00 eb 15 66 0f 1f 84 00 00
[68013.003164] NMI watchdog: BUG: soft lockup - CPU#62 stuck for 23s! [mdt01_060:4621]
[68013.003552] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) ko2iblnd(OE) lnet(OE) libcfs(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ses enclosure scsi_transport_sas sg joydev i2c_i801 lpc_ich mei_me mei wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad knem(OE) ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ast i2c_algo_bit
[68013.003594] drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx5_core(OE) crct10dif_pclmul crct10dif_common crc32c_intel ahci mlxfw(OE) vfio_mdev(OE) vfio_iommu_type1 i40e drm vfio libahci mdev(OE) devlink megaraid_sas mlx_compat(OE) libata ptp pps_core drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod
[68013.003613] CPU: 62 PID: 4621 Comm: mdt01_060 Kdump: loaded Tainted: G OEL ------------ 3.10.0-1160.118.1.el7_lustre.x86_64 #1
[68013.003615] Hardware name: Enginetech EG540MS-G20/EG11DPH-T, BIOS 3.3 02/24/2020
[68013.003617] task: ffff96d299425280 ti: ffff96d294b44000 task.ti: ffff96d294b44000
[68013.003619] RIP: 0010:[<ffffffff8c11ec62>] [<ffffffff8c11ec62>] native_queued_spin_lock_slowpath+0x122/0x200
[68013.003623] RSP: 0018:ffff96d294b47cb8 EFLAGS: 00000246
[68013.003625] RAX: 0000000000000000 RBX: ffff96bf6967d000 RCX: 0000000001f10000
[68013.003627] RDX: ffff96da1d49b8c0 RSI: 0000000000d10001 RDI: ffff96d9f0f38030
[68013.003629] RBP: ffff96d294b47cb8 R08: ffff96da1d99b8c0 R09: 0000000000000000
[68013.003631] R10: 00000000227d9b01 R11: ffff96c6227dec00 R12: ffff96d8c72f0800
[68013.003633] R13: ffff96d294b47cb0 R14: ffff96abbfc04b00 R15: ffff96d9ef9c0001
[68013.003635] FS: 0000000000000000(0000) GS:ffff96da1d980000(0000) knlGS:0000000000000000
[68013.003637] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[68013.003639] CR2: 00007f2865b94000 CR3: 0000000273c10000 CR4: 00000000007607e0
[68013.003641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[68013.003643] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[68013.003644] PKRU: 00000000
[68013.003645] Call Trace:
[68013.003649] [<ffffffff8c7ac21a>] queued_spin_lock_slowpath+0xb/0x13
[68013.003652] [<ffffffff8c7ba734>] _raw_spin_lock+0x24/0x30
[68013.003711] [<ffffffffc118447a>] ptlrpc_server_drop_request+0x1ca/0x6f0 [ptlrpc]
[68013.003771] [<ffffffffc1184a32>] ptlrpc_server_finish_active_request+0x92/0x140 [ptlrpc]
[68013.003829] [<ffffffffc1186cc1>] ptlrpc_server_handle_request+0x401/0xab0 [ptlrpc]
[68013.003888] [<ffffffffc11838c5>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]
[68013.003891] [<ffffffff8c0d8f73>] ? __wake_up+0x13/0x20
[68013.003949] [<ffffffffc118a4b4>] ptlrpc_main+0xb44/0x1480 [ptlrpc]
[68013.004008] [<ffffffffc1189970>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
[68013.004011] [<ffffffff8c0cb621>] kthread+0xd1/0xe0
[68013.004015] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68013.004017] [<ffffffff8c7c51dd>] ret_from_fork_nospec_begin+0x7/0x21
[68013.004021] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68013.004023] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 c0 b8 01 00 48 03 14 c5 c0 19 d5 8c 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b
[68013.007164] NMI watchdog: BUG: soft lockup - CPU#63 stuck for 23s! [mdt01_072:4637]
[68013.007554] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) ko2iblnd(OE) lnet(OE) libcfs(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ses enclosure scsi_transport_sas sg joydev i2c_i801 lpc_ich mei_me mei wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad knem(OE) ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ast i2c_algo_bit
[68013.007598] drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx5_core(OE) crct10dif_pclmul crct10dif_common crc32c_intel ahci mlxfw(OE) vfio_mdev(OE) vfio_iommu_type1 i40e drm vfio libahci mdev(OE) devlink megaraid_sas mlx_compat(OE) libata ptp pps_core drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod
[68013.007617] CPU: 63 PID: 4637 Comm: mdt01_072 Kdump: loaded Tainted: G OEL ------------ 3.10.0-1160.118.1.el7_lustre.x86_64 #1
[68013.007619] Hardware name: Enginetech EG540MS-G20/EG11DPH-T, BIOS 3.3 02/24/2020
[68013.007621] task: ffff96d8d3285280 ti: ffff96d28ea00000 task.ti: ffff96d28ea00000
[68013.007623] RIP: 0010:[<ffffffff8c11ec62>] [<ffffffff8c11ec62>] native_queued_spin_lock_slowpath+0x122/0x200
[68013.007628] RSP: 0018:ffff96d28ea03a98 EFLAGS: 00000246
[68013.007630] RAX: 0000000000000000 RBX: ffffffffc12790a0 RCX: 0000000001f90000
[68013.007632] RDX: ffff96c21ef5b8c0 RSI: 0000000000290001 RDI: ffff96d9efa59000
[68013.007634] RBP: ffff96d28ea03a98 R08: ffff96da1d9db8c0 R09: 0000000000000000
[68013.007635] R10: 0000000000000096 R11: ffff96cece61a100 R12: 0000000000000001
[68013.007637] R13: ffff96d9f20f1200 R14: ffff96cece61a448 R15: ffffffffc1176fbf
[68013.007640] FS: 0000000000000000(0000) GS:ffff96da1d9c0000(0000) knlGS:0000000000000000
[68013.007642] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[68013.007644] CR2: 00007fc4af1921cc CR3: 0000000273c10000 CR4: 00000000007607e0
[68013.007646] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[68013.007648] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[68013.007649] PKRU: 00000000
[68013.007650] Call Trace:
[68013.007654] [<ffffffff8c7ac21a>] queued_spin_lock_slowpath+0xb/0x13
[68013.007657] [<ffffffff8c7ba734>] _raw_spin_lock+0x24/0x30
[68013.007670] [<ffffffffc0898cf8>] cfs_percpt_lock+0x58/0x110 [libcfs]
[68013.007687] [<ffffffffc0e1c4b1>] LNetMDBind+0xe1/0x5f0 [lnet]
[68013.007748] [<ffffffffc116fc7f>] ptl_send_buf+0xff/0x530 [ptlrpc]
[68013.007760] [<ffffffffc0880c2e>] ? ktime_get_real_seconds+0xe/0x20 [libcfs]
[68013.007820] [<ffffffffc117309b>] ptlrpc_send_reply+0x29b/0x850 [ptlrpc]
[68013.007877] [<ffffffffc1130a3e>] target_send_reply_msg+0x8e/0x180 [ptlrpc]
[68013.007933] [<ffffffffc113b29e>] target_send_reply+0x30e/0x730 [ptlrpc]
[68013.007995] [<ffffffffc117a6c7>] ? lustre_msg_set_last_committed+0x27/0xb0 [ptlrpc]
[68013.008061] [<ffffffffc11e2c17>] tgt_request_handle+0x697/0x1580 [ptlrpc]
[68013.008126] [<ffffffffc11bc105>] ? ptlrpc_nrs_req_get_nolock0+0xd5/0x170 [ptlrpc]
[68013.008137] [<ffffffffc0880c2e>] ? ktime_get_real_seconds+0xe/0x20 [libcfs]
[68013.008201] [<ffffffffc1186b0b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[68013.008262] [<ffffffffc11838c5>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]
[68013.008265] [<ffffffff8c0d8f73>] ? __wake_up+0x13/0x20
[68013.008325] [<ffffffffc118a4b4>] ptlrpc_main+0xb44/0x1480 [ptlrpc]
[68013.008386] [<ffffffffc1189970>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
[68013.008389] [<ffffffff8c0cb621>] kthread+0xd1/0xe0
[68013.008393] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68013.008396] [<ffffffff8c7c51dd>] ret_from_fork_nospec_begin+0x7/0x21
[68013.008399] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68013.008401] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 c0 b8 01 00 48 03 14 c5 c0 19 d5 8c 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b
[68016.557164] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ptlrpcd_00_13:3356]
[68016.557739] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) ko2iblnd(OE) lnet(OE) libcfs(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ses enclosure scsi_transport_sas sg joydev i2c_i801 lpc_ich mei_me mei wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad knem(OE) ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ast i2c_algo_bit
[68016.557783] drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx5_core(OE) crct10dif_pclmul crct10dif_common crc32c_intel ahci mlxfw(OE) vfio_mdev(OE) vfio_iommu_type1 i40e drm vfio libahci mdev(OE) devlink megaraid_sas mlx_compat(OE) libata ptp pps_core drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod
[68016.557802] CPU: 1 PID: 3356 Comm: ptlrpcd_00_13 Kdump: loaded Tainted: G OEL ------------ 3.10.0-1160.118.1.el7_lustre.x86_64 #1
[68016.557804] Hardware name: Enginetech EG540MS-G20/EG11DPH-T, BIOS 3.3 02/24/2020
[68016.557806] task: ffff96c20e2ba100 ti: ffff96d9ef5cc000 task.ti: ffff96d9ef5cc000
[68016.557808] RIP: 0010:[<ffffffff8c11ec62>] [<ffffffff8c11ec62>] native_queued_spin_lock_slowpath+0x122/0x200
[68016.557812] RSP: 0018:ffff96d9ef5cfae0 EFLAGS: 00000246
[68016.557814] RAX: 0000000000000000 RBX: ffff96d9ef5cfa58 RCX: 0000000000090000
[68016.557816] RDX: ffff96c21f41b8c0 RSI: 0000000001410001 RDI: ffff96d9efa59000
[68016.557818] RBP: ffff96d9ef5cfae0 R08: ffff96c21ee5b8c0 R09: 0000000000000000
[68016.557820] R10: ffff96abbfc07500 R11: ffff96ab5ae8f000 R12: ffffffff8c0d8990
[68016.557822] R13: 0000000000000000 R14: ffff96ab00000000 R15: 00000003821fa000
[68016.557824] FS: 0000000000000000(0000) GS:ffff96c21ee40000(0000) knlGS:0000000000000000
[68016.557826] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[68016.557828] CR2: 00007f67299d4224 CR3: 0000000273c10000 CR4: 00000000007607e0
[68016.557830] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[68016.557832] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[68016.557833] PKRU: 00000000
[68016.557835] Call Trace:
[68016.557838] [<ffffffff8c7ac21a>] queued_spin_lock_slowpath+0xb/0x13
[68016.557841] [<ffffffff8c7ba734>] _raw_spin_lock+0x24/0x30
[68016.557854] [<ffffffffc0898cf8>] cfs_percpt_lock+0x58/0x110 [libcfs]
[68016.557870] [<ffffffffc0e17474>] LNetMEAttach+0xa4/0x2d0 [lnet]
[68016.557931] [<ffffffffc11723ad>] ptl_send_rpc+0x42d/0xe80 [ptlrpc]
[68016.557993] [<ffffffffc11786ac>] ? lustre_msg_set_status+0xc/0xb0 [ptlrpc]
[68016.558052] [<ffffffffc1167120>] ptlrpc_send_new_req+0x450/0xa60 [ptlrpc]
[68016.558110] [<ffffffffc116a168>] ptlrpc_check_set.part.23+0x6a8/0x1df0 [ptlrpc]
[68016.558169] [<ffffffffc116b90b>] ptlrpc_check_set+0x5b/0xf0 [ptlrpc]
[68016.558231] [<ffffffffc1198b9b>] ptlrpcd_check+0x4bb/0x5a0 [ptlrpc]
[68016.558292] [<ffffffffc1198f2b>] ptlrpcd+0x2ab/0x570 [ptlrpc]
[68016.558295] [<ffffffff8c0e1170>] ? wake_up_state+0x20/0x20
[68016.558355] [<ffffffffc1198c80>] ? ptlrpcd_check+0x5a0/0x5a0 [ptlrpc]
[68016.558359] [<ffffffff8c0cb621>] kthread+0xd1/0xe0
[68016.558362] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68016.558365] [<ffffffff8c7c51dd>] ret_from_fork_nospec_begin+0x7/0x21
[68016.558369] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68016.558370] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 c0 b8 01 00 48 03 14 c5 c0 19 d5 8c 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b
[68016.623164] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [ptlrpcd_00_28:3371]
[68016.623605] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) ko2iblnd(OE) lnet(OE) libcfs(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ses enclosure scsi_transport_sas sg joydev i2c_i801 lpc_ich mei_me mei wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad knem(OE) ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ast i2c_algo_bit
[68016.623648] drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx5_core(OE) crct10dif_pclmul crct10dif_common crc32c_intel ahci mlxfw(OE) vfio_mdev(OE) vfio_iommu_type1 i40e drm vfio libahci mdev(OE) devlink megaraid_sas mlx_compat(OE) libata ptp pps_core drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod
[68016.623667] CPU: 8 PID: 3371 Comm: ptlrpcd_00_28 Kdump: loaded Tainted: G OEL ------------ 3.10.0-1160.118.1.el7_lustre.x86_64 #1
[68016.623669] Hardware name: Enginetech EG540MS-G20/EG11DPH-T, BIOS 3.3 02/24/2020
[68016.623671] task: ffff96d9f0696300 ti: ffff96d9f0640000 task.ti: ffff96d9f0640000
[68016.623673] RIP: 0010:[<ffffffff8c11ec62>] [<ffffffff8c11ec62>] native_queued_spin_lock_slowpath+0x122/0x200
[68016.623677] RSP: 0018:ffff96d9f0643ae0 EFLAGS: 00000246
[68016.623679] RAX: 0000000000000000 RBX: ffff96d9f0643a58 RCX: 0000000000410000
[68016.623681] RDX: ffff96c21f05b8c0 RSI: 0000000000490001 RDI: ffff96d9efa59000
[68016.623683] RBP: ffff96d9f0643ae0 R08: ffff96c21f01b8c0 R09: 0000000000000000
[68016.623685] R10: ffff96abbfc07500 R11: ffffe0bda7e34c00 R12: ffffffff8c0d8990
[68016.623687] R13: 0000000000000000 R14: ffff96d900000000 R15: 000000037c28fc00
[68016.623689] FS: 0000000000000000(0000) GS:ffff96c21f000000(0000) knlGS:0000000000000000
[68016.623691] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[68016.623693] CR2: 00000000025fbe78 CR3: 0000000273c10000 CR4: 00000000007607e0
[68016.623695] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[68016.623697] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[68016.623698] PKRU: 00000000
[68016.623700] Call Trace:
[68016.623703] [<ffffffff8c7ac21a>] queued_spin_lock_slowpath+0xb/0x13
[68016.623706] [<ffffffff8c7ba734>] _raw_spin_lock+0x24/0x30
[68016.623719] [<ffffffffc0898cf8>] cfs_percpt_lock+0x58/0x110 [libcfs]
[68016.623735] [<ffffffffc0e17474>] LNetMEAttach+0xa4/0x2d0 [lnet]
[68016.623796] [<ffffffffc11723ad>] ptl_send_rpc+0x42d/0xe80 [ptlrpc]
[68016.623807] [<ffffffffc0880c2e>] ? ktime_get_real_seconds+0xe/0x20 [libcfs]
[68016.623868] [<ffffffffc1167120>] ptlrpc_send_new_req+0x450/0xa60 [ptlrpc]
[68016.623871] [<ffffffff8c0e72ee>] ? account_entity_dequeue+0xae/0xd0
[68016.623929] [<ffffffffc116a168>] ptlrpc_check_set.part.23+0x6a8/0x1df0 [ptlrpc]
[68016.623933] [<ffffffff8c0ec651>] ? put_prev_entity+0x31/0x400
[68016.623990] [<ffffffffc116b90b>] ptlrpc_check_set+0x5b/0xf0 [ptlrpc]
[68016.624052] [<ffffffffc1198b9b>] ptlrpcd_check+0x4bb/0x5a0 [ptlrpc]
[68016.624113] [<ffffffffc1198f90>] ptlrpcd+0x310/0x570 [ptlrpc]
[68016.624117] [<ffffffff8c0e1170>] ? wake_up_state+0x20/0x20
[68016.624177] [<ffffffffc1198c80>] ? ptlrpcd_check+0x5a0/0x5a0 [ptlrpc]
[68016.624181] [<ffffffff8c0cb621>] kthread+0xd1/0xe0
[68016.624185] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68016.624187] [<ffffffff8c7c51dd>] ret_from_fork_nospec_begin+0x7/0x21
[68016.624191] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68016.624193] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 c0 b8 01 00 48 03 14 c5 c0 19 d5 8c 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b
[68016.631164] NMI watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [ptlrpcd_00_19:3362]
[68016.631583] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) ko2iblnd(OE) lnet(OE) libcfs(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ses enclosure scsi_transport_sas sg joydev i2c_i801 lpc_ich mei_me mei wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad knem(OE) ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ast i2c_algo_bit
[68016.631626] drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx5_core(OE) crct10dif_pclmul crct10dif_common crc32c_intel ahci mlxfw(OE) vfio_mdev(OE) vfio_iommu_type1 i40e drm vfio libahci mdev(OE) devlink megaraid_sas mlx_compat(OE) libata ptp pps_core drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod
[68016.631645] CPU: 9 PID: 3362 Comm: ptlrpcd_00_19 Kdump: loaded Tainted: G OEL ------------ 3.10.0-1160.118.1.el7_lustre.x86_64 #1
[68016.631647] Hardware name: Enginetech EG540MS-G20/EG11DPH-T, BIOS 3.3 02/24/2020
[68016.631649] task: ffff96c218bfb180 ti: ffff96d9ee568000 task.ti: ffff96d9ee568000
[68016.631651] RIP: 0010:[<ffffffff8c11ec62>] [<ffffffff8c11ec62>] native_queued_spin_lock_slowpath+0x122/0x200
[68016.631655] RSP: 0018:ffff96d9ee56bae0 EFLAGS: 00000246
[68016.631657] RAX: 0000000000000000 RBX: ffff96d9ee56ba58 RCX: 0000000000490000
[68016.631659] RDX: ffff96c21f25b8c0 RSI: 0000000001090001 RDI: ffff96d9efa59000
[68016.631661] RBP: ffff96d9ee56bae0 R08: ffff96c21f05b8c0 R09: 0000000000000000
[68016.631663] R10: ffff96abbfc07500 R11: ffff96c9df303e00 R12: ffffffff8c0d8990
[68016.631665] R13: 00000001802a002a R14: ffff96d900000000 R15: 000000034825ea00
[68016.631667] FS: 0000000000000000(0000) GS:ffff96c21f040000(0000) knlGS:0000000000000000
[68016.631669] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[68016.631671] CR2: 00000000025abd10 CR3: 0000000273c10000 CR4: 00000000007607e0
[68016.631673] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[68016.631675] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[68016.631676] PKRU: 00000000
[68016.631678] Call Trace:
[68016.631681] [<ffffffff8c7ac21a>] queued_spin_lock_slowpath+0xb/0x13
[68016.631684] [<ffffffff8c7ba734>] _raw_spin_lock+0x24/0x30
[68016.631697] [<ffffffffc0898cf8>] cfs_percpt_lock+0x58/0x110 [libcfs]
[68016.631713] [<ffffffffc0e17474>] LNetMEAttach+0xa4/0x2d0 [lnet]
[68016.631774] [<ffffffffc11723ad>] ptl_send_rpc+0x42d/0xe80 [ptlrpc]
[68016.631778] [<ffffffff8c5e8e2f>] ? get_target_pstate_use_performance+0x8f/0xc0
[68016.631838] [<ffffffffc1167120>] ptlrpc_send_new_req+0x450/0xa60 [ptlrpc]
[68016.631841] [<ffffffff8c0e72ee>] ? account_entity_dequeue+0xae/0xd0
[68016.631900] [<ffffffffc116a168>] ptlrpc_check_set.part.23+0x6a8/0x1df0 [ptlrpc]
[68016.631903] [<ffffffff8c0ec651>] ? put_prev_entity+0x31/0x400
[68016.631961] [<ffffffffc116b90b>] ptlrpc_check_set+0x5b/0xf0 [ptlrpc]
[68016.632023] [<ffffffffc1198b9b>] ptlrpcd_check+0x4bb/0x5a0 [ptlrpc]
[68016.632083] [<ffffffffc1198f90>] ptlrpcd+0x310/0x570 [ptlrpc]
[68016.632087] [<ffffffff8c0e1170>] ? wake_up_state+0x20/0x20
[68016.632147] [<ffffffffc1198c80>] ? ptlrpcd_check+0x5a0/0x5a0 [ptlrpc]
[68016.632150] [<ffffffff8c0cb621>] kthread+0xd1/0xe0
[68016.632154] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68016.632157] [<ffffffff8c7c51dd>] ret_from_fork_nospec_begin+0x7/0x21
[68016.632160] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68016.632162] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 c0 b8 01 00 48 03 14 c5 c0 19 d5 8c 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b
[68016.639164] NMI watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [ptlrpcd_00_15:3358]
[68016.639591] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) ko2iblnd(OE) lnet(OE) libcfs(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ses enclosure scsi_transport_sas sg joydev i2c_i801 lpc_ich mei_me mei wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad knem(OE) ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ast i2c_algo_bit
[68016.639634] drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx5_core(OE) crct10dif_pclmul crct10dif_common crc32c_intel ahci mlxfw(OE) vfio_mdev(OE) vfio_iommu_type1 i40e drm vfio libahci mdev(OE) devlink megaraid_sas mlx_compat(OE) libata ptp pps_core drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod
[68016.639653] CPU: 10 PID: 3358 Comm: ptlrpcd_00_15 Kdump: loaded Tainted: G OEL ------------ 3.10.0-1160.118.1.el7_lustre.x86_64 #1
[68016.639655] Hardware name: Enginetech EG540MS-G20/EG11DPH-T, BIOS 3.3 02/24/2020
[68016.639657] task: ffff96c20e2bc200 ti: ffff96da1c02c000 task.ti: ffff96da1c02c000
[68016.639658] RIP: 0010:[<ffffffff8c11ec62>] [<ffffffff8c11ec62>] native_queued_spin_lock_slowpath+0x122/0x200
[68016.639663] RSP: 0018:ffff96da1c02fae0 EFLAGS: 00000246
[68016.639665] RAX: 0000000000000000 RBX: ffff96da1c02fa58 RCX: 0000000000510000
[68016.639666] RDX: ffff96c21f15b8c0 RSI: 0000000000690001 RDI: ffff96d9efa59000
[68016.639668] RBP: ffff96da1c02fae0 R08: ffff96c21f09b8c0 R09: 0000000000000000
[68016.639670] R10: ffff96abbfc07500 R11: ffffe0be2c4c0400 R12: ffffffff8c0d8990
[68016.639672] R13: 0000000000000000 R14: ffff96da00000000 R15: 0000000319bd9200
[68016.639675] FS: 0000000000000000(0000) GS:ffff96c21f080000(0000) knlGS:0000000000000000
[68016.639677] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[68016.639678] CR2: 00007f49c1e3bd70 CR3: 00000005ee852000 CR4: 00000000007607e0
[68016.639680] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[68016.639682] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[68016.639684] PKRU: 00000000
[68016.639685] Call Trace:
[68016.639689] [<ffffffff8c7ac21a>] queued_spin_lock_slowpath+0xb/0x13
[68016.639692] [<ffffffff8c7ba734>] _raw_spin_lock+0x24/0x30
[68016.639704] [<ffffffffc0898cf8>] cfs_percpt_lock+0x58/0x110 [libcfs]
[68016.639721] [<ffffffffc0e17474>] LNetMEAttach+0xa4/0x2d0 [lnet]
[68016.639782] [<ffffffffc11723ad>] ptl_send_rpc+0x42d/0xe80 [ptlrpc]
[68016.639843] [<ffffffffc1167120>] ptlrpc_send_new_req+0x450/0xa60 [ptlrpc]
[68016.639846] [<ffffffff8c0e72ee>] ? account_entity_dequeue+0xae/0xd0
[68016.639905] [<ffffffffc116a168>] ptlrpc_check_set.part.23+0x6a8/0x1df0 [ptlrpc]
[68016.639908] [<ffffffff8c0ec651>] ? put_prev_entity+0x31/0x400
[68016.639966] [<ffffffffc116b90b>] ptlrpc_check_set+0x5b/0xf0 [ptlrpc]
[68016.640029] [<ffffffffc1198b9b>] ptlrpcd_check+0x4bb/0x5a0 [ptlrpc]
[68016.640090] [<ffffffffc1198f90>] ptlrpcd+0x310/0x570 [ptlrpc]
[68016.640093] [<ffffffff8c0e1170>] ? wake_up_state+0x20/0x20
[68016.640153] [<ffffffffc1198c80>] ? ptlrpcd_check+0x5a0/0x5a0 [ptlrpc]
[68016.640157] [<ffffffff8c0cb621>] kthread+0xd1/0xe0
[68016.640161] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68016.640164] [<ffffffff8c7c51dd>] ret_from_fork_nospec_begin+0x7/0x21
[68016.640167] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68016.640169] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 c0 b8 01 00 48 03 14 c5 c0 19 d5 8c 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b
[68016.648164] NMI watchdog: BUG: soft lockup - CPU#11 stuck for 22s! [ptlrpcd_00_06:3349]
[68016.648591] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) ko2iblnd(OE) lnet(OE) libcfs(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ses enclosure scsi_transport_sas sg joydev i2c_i801 lpc_ich mei_me mei wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad knem(OE) ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ast i2c_algo_bit
[68016.648633] drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx5_core(OE) crct10dif_pclmul crct10dif_common crc32c_intel ahci mlxfw(OE) vfio_mdev(OE) vfio_iommu_type1 i40e drm vfio libahci mdev(OE) devlink megaraid_sas mlx_compat(OE) libata ptp pps_core drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod
[68016.648652] CPU: 11 PID: 3349 Comm: ptlrpcd_00_06 Kdump: loaded Tainted: G OEL ------------ 3.10.0-1160.118.1.el7_lustre.x86_64 #1
[68016.648654] Hardware name: Enginetech EG540MS-G20/EG11DPH-T, BIOS 3.3 02/24/2020
[68016.648656] task: ffff96c20754d280 ti: ffff96da1ad0c000 task.ti: ffff96da1ad0c000
[68016.648658] RIP: 0010:[<ffffffff8c11ec62>] [<ffffffff8c11ec62>] native_queued_spin_lock_slowpath+0x122/0x200
[68016.648662] RSP: 0018:ffff96da1ad0fae0 EFLAGS: 00000246
[68016.648664] RAX: 0000000000000000 RBX: ffff96abc5a04440 RCX: 0000000000590000
[68016.648666] RDX: ffff96c21f29b8c0 RSI: 0000000001110001 RDI: ffff96d9efa59000
[68016.648668] RBP: ffff96da1ad0fae0 R08: ffff96c21f0db8c0 R09: 0000000000000000
[68016.648669] R10: ffff96abbfc07500 R11: 0000000000000400 R12: ffffffff8c0d8990
[68016.648671] R13: ffff96abc5a04440 R14: ffff96bb29716d20 R15: ffffffff8c2332fd
[68016.648674] FS: 0000000000000000(0000) GS:ffff96c21f0c0000(0000) knlGS:0000000000000000
[68016.648676] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[68016.648678] CR2: 00007f671c7a6ff0 CR3: 0000000273c10000 CR4: 00000000007607e0
[68016.648680] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[68016.648681] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[68016.648683] PKRU: 00000000
[68016.648684] Call Trace:
[68016.648688] [<ffffffff8c7ac21a>] queued_spin_lock_slowpath+0xb/0x13
[68016.648690] [<ffffffff8c7ba734>] _raw_spin_lock+0x24/0x30
[68016.648703] [<ffffffffc0898cf8>] cfs_percpt_lock+0x58/0x110 [libcfs]
[68016.648719] [<ffffffffc0e17474>] LNetMEAttach+0xa4/0x2d0 [lnet]
[68016.648780] [<ffffffffc11723ad>] ptl_send_rpc+0x42d/0xe80 [ptlrpc]
[68016.648841] [<ffffffffc1167120>] ptlrpc_send_new_req+0x450/0xa60 [ptlrpc]
[68016.648845] [<ffffffff8c0e72ee>] ? account_entity_dequeue+0xae/0xd0
[68016.648903] [<ffffffffc116a168>] ptlrpc_check_set.part.23+0x6a8/0x1df0 [ptlrpc]
[68016.648907] [<ffffffff8c0ec651>] ? put_prev_entity+0x31/0x400
[68016.648964] [<ffffffffc116b90b>] ptlrpc_check_set+0x5b/0xf0 [ptlrpc]
[68016.649027] [<ffffffffc1198b9b>] ptlrpcd_check+0x4bb/0x5a0 [ptlrpc]
[68016.649088] [<ffffffffc1198f90>] ptlrpcd+0x310/0x570 [ptlrpc]
[68016.649092] [<ffffffff8c0e1170>] ? wake_up_state+0x20/0x20
[68016.649153] [<ffffffffc1198c80>] ? ptlrpcd_check+0x5a0/0x5a0 [ptlrpc]
[68016.649156] [<ffffffff8c0cb621>] kthread+0xd1/0xe0
[68016.649160] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68016.649163] [<ffffffff8c7c51dd>] ret_from_fork_nospec_begin+0x7/0x21
[68016.649166] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68016.649168] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 c0 b8 01 00 48 03 14 c5 c0 19 d5 8c 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b
[68016.665164] NMI watchdog: BUG: soft lockup - CPU#13 stuck for 22s! [ptlrpcd_00_14:3357]
[68016.665587] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) ko2iblnd(OE) lnet(OE) libcfs(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ses enclosure scsi_transport_sas sg joydev i2c_i801 lpc_ich mei_me mei wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad knem(OE) ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ast i2c_algo_bit
[68016.665631] drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx5_core(OE) crct10dif_pclmul crct10dif_common crc32c_intel ahci mlxfw(OE) vfio_mdev(OE) vfio_iommu_type1 i40e drm vfio libahci mdev(OE) devlink megaraid_sas mlx_compat(OE) libata ptp pps_core drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod
[68016.665650] CPU: 13 PID: 3357 Comm: ptlrpcd_00_14 Kdump: loaded Tainted: G OEL ------------ 3.10.0-1160.118.1.el7_lustre.x86_64 #1
[68016.665652] Hardware name: Enginetech EG540MS-G20/EG11DPH-T, BIOS 3.3 02/24/2020
[68016.665654] task: ffff96c20e2b9080 ti: ffff96d9f1cc0000 task.ti: ffff96d9f1cc0000
[68016.665656] RIP: 0010:[<ffffffff8c11ec62>] [<ffffffff8c11ec62>] native_queued_spin_lock_slowpath+0x122/0x200
[68016.665660] RSP: 0018:ffff96d9f1cc3ae0 EFLAGS: 00000246
[68016.665662] RAX: 0000000000000000 RBX: ffff96d9f1cc3a58 RCX: 0000000000690000
[68016.665664] RDX: ffff96c21ee5b8c0 RSI: 0000000000090001 RDI: ffff96d9efa59000
[68016.665666] RBP: ffff96d9f1cc3ae0 R08: ffff96c21f15b8c0 R09: 0000000000000000
[68016.665668] R10: ffff96abbfc07500 R11: ffffe0bd846ba200 R12: ffffffff8c0d8990
[68016.665669] R13: 0000000000000000 R14: ffff96d900000000 R15: 000000035ae8cc00
[68016.665672] FS: 0000000000000000(0000) GS:ffff96c21f140000(0000) knlGS:0000000000000000
[68016.665674] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[68016.665676] CR2: 00007f671cf73070 CR3: 0000000273c10000 CR4: 00000000007607e0
[68016.665678] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[68016.665680] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[68016.665681] PKRU: 00000000
[68016.665682] Call Trace:
[68016.665686] [<ffffffff8c7ac21a>] queued_spin_lock_slowpath+0xb/0x13
[68016.665689] [<ffffffff8c7ba734>] _raw_spin_lock+0x24/0x30
[68016.665701] [<ffffffffc0898cf8>] cfs_percpt_lock+0x58/0x110 [libcfs]
[68016.665718] [<ffffffffc0e17474>] LNetMEAttach+0xa4/0x2d0 [lnet]
[68016.665779] [<ffffffffc11723ad>] ptl_send_rpc+0x42d/0xe80 [ptlrpc]
[68016.665782] [<ffffffff8c0eb01c>] ? dequeue_entity+0x11c/0x5d0
[68016.665842] [<ffffffffc1167120>] ptlrpc_send_new_req+0x450/0xa60 [ptlrpc]
[68016.665901] [<ffffffffc116a168>] ptlrpc_check_set.part.23+0x6a8/0x1df0 [ptlrpc]
[68016.665959] [<ffffffffc116b90b>] ptlrpc_check_set+0x5b/0xf0 [ptlrpc]
[68016.666021] [<ffffffffc1198b9b>] ptlrpcd_check+0x4bb/0x5a0 [ptlrpc]
[68016.666081] [<ffffffffc1198f2b>] ptlrpcd+0x2ab/0x570 [ptlrpc]
[68016.666085] [<ffffffff8c0e1170>] ? wake_up_state+0x20/0x20
[68016.666145] [<ffffffffc1198c80>] ? ptlrpcd_check+0x5a0/0x5a0 [ptlrpc]
[68016.666148] [<ffffffff8c0cb621>] kthread+0xd1/0xe0
[68016.666152] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68016.666155] [<ffffffff8c7c51dd>] ret_from_fork_nospec_begin+0x7/0x21
[68016.666158] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68016.666160] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 c0 b8 01 00 48 03 14 c5 c0 19 d5 8c 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b
[68016.906164] NMI watchdog: BUG: soft lockup - CPU#34 stuck for 22s! [ptlrpcd_00_04:3347]
[68016.906578] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) ko2iblnd(OE) lnet(OE) libcfs(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ses enclosure scsi_transport_sas sg joydev i2c_i801 lpc_ich mei_me mei wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad knem(OE) ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ast i2c_algo_bit
[68016.906621] drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx5_core(OE) crct10dif_pclmul crct10dif_common crc32c_intel ahci mlxfw(OE) vfio_mdev(OE) vfio_iommu_type1 i40e drm vfio libahci mdev(OE) devlink megaraid_sas mlx_compat(OE) libata ptp pps_core drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod
[68016.906640] CPU: 34 PID: 3347 Comm: ptlrpcd_00_04 Kdump: loaded Tainted: G OEL ------------ 3.10.0-1160.118.1.el7_lustre.x86_64 #1
[68016.906642] Hardware name: Enginetech EG540MS-G20/EG11DPH-T, BIOS 3.3 02/24/2020
[68016.906644] task: ffff96d9f1186300 ti: ffff96d9eebf4000 task.ti: ffff96d9eebf4000
[68016.906646] RIP: 0010:[<ffffffff8c11ec62>] [<ffffffff8c11ec62>] native_queued_spin_lock_slowpath+0x122/0x200
[68016.906650] RSP: 0018:ffff96d9eebf7ae0 EFLAGS: 00000246
[68016.906652] RAX: 0000000000000000 RBX: ffff96d9eebf7a58 RCX: 0000000001110000
[68016.906654] RDX: ffff96c21f01b8c0 RSI: 0000000000410001 RDI: ffff96d9efa59000
[68016.906656] RBP: ffff96d9eebf7ae0 R08: ffff96c21f29b8c0 R09: 0000000000000000
[68016.906658] R10: ffff96abbfc07500 R11: ffff96d806abf400 R12: ffffffff8c0d8990
[68016.906660] R13: 0000000000000000 R14: ffff96d900000000 R15: 0000000319783c00
[68016.906662] FS: 0000000000000000(0000) GS:ffff96c21f280000(0000) knlGS:0000000000000000
[68016.906664] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[68016.906666] CR2: 00000000025fbe78 CR3: 0000000273c10000 CR4: 00000000007607e0
[68016.906668] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[68016.906670] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[68016.906671] PKRU: 00000000
[68016.906673] Call Trace:
[68016.906676] [<ffffffff8c7ac21a>] queued_spin_lock_slowpath+0xb/0x13
[68016.906679] [<ffffffff8c7ba734>] _raw_spin_lock+0x24/0x30
[68016.906692] [<ffffffffc0898cf8>] cfs_percpt_lock+0x58/0x110 [libcfs]
[68016.906709] [<ffffffffc0e17474>] LNetMEAttach+0xa4/0x2d0 [lnet]
[68016.906770] [<ffffffffc11723ad>] ptl_send_rpc+0x42d/0xe80 [ptlrpc]
[68016.906830] [<ffffffffc1167120>] ptlrpc_send_new_req+0x450/0xa60 [ptlrpc]
[68016.906833] [<ffffffff8c0e72ee>] ? account_entity_dequeue+0xae/0xd0
[68016.906892] [<ffffffffc116a168>] ptlrpc_check_set.part.23+0x6a8/0x1df0 [ptlrpc]
[68016.906895] [<ffffffff8c0ec651>] ? put_prev_entity+0x31/0x400
[68016.906953] [<ffffffffc116b90b>] ptlrpc_check_set+0x5b/0xf0 [ptlrpc]
[68016.907015] [<ffffffffc1198b9b>] ptlrpcd_check+0x4bb/0x5a0 [ptlrpc]
[68016.907076] [<ffffffffc1198f90>] ptlrpcd+0x310/0x570 [ptlrpc]
[68016.907080] [<ffffffff8c0e1170>] ? wake_up_state+0x20/0x20
[68016.907140] [<ffffffffc1198c80>] ? ptlrpcd_check+0x5a0/0x5a0 [ptlrpc]
[68016.907144] [<ffffffff8c0cb621>] kthread+0xd1/0xe0
[68016.907147] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68016.907150] [<ffffffff8c7c51dd>] ret_from_fork_nospec_begin+0x7/0x21
[68016.907154] [<ffffffff8c0cb550>] ? insert_kthread_work+0x40/0x40
[68016.907156] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 c0 b8 01 00 48 03 14 c5 c0 19 d5 8c 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b
[68016.913164] NMI watchdog: BUG: soft lockup - CPU#36 stuck for 22s! [ptlrpcd_00_18:3361]
[68016.913581] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) ko2iblnd(OE) lnet(OE) libcfs(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ses enclosure scsi_transport_sas sg joydev i2c_i801 lpc_ich mei_me mei wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad knem(OE) ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ast i2c_algo_bit
More information about the lustre-discuss
mailing list