[lustre-discuss] MDTs will only mount read only

Mike Mosley Mike.Mosley at charlotte.edu
Wed Jun 21 08:32:33 PDT 2023


Greetings,

We have experienced some type of issue that is causing both of our MDS
servers to only be able to mount the mdt device in read only mode.  Here
are some of the error messages we are seeing in the log files below.   We
lost our Lustre expert a while back and we are not sure how to proceed to
troubleshoot this issue.   Can anybody provide us guidance on how to
proceed?

Thanks,

Mike

Jun 20 15:12:14 hyd-mds1 kernel: INFO: task mount.lustre:4123 blocked for
more than 120 seconds.

Jun 20 15:12:14 hyd-mds1 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.

Jun 20 15:12:14 hyd-mds1 kernel: mount.lustre    D ffff9f27a3bc5230
  0  4123      1 0x00000086

Jun 20 15:12:14 hyd-mds1 kernel: Call Trace:

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbb585da9>] schedule+0x29/0x70

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbb5838b1>]
schedule_timeout+0x221/0x2d0

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbaf6b8e5>] ?
tracing_is_on+0x15/0x30

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbaf6f5bd>] ?
tracing_record_cmdline+0x1d/0x120

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbaf77d9b>] ?
probe_sched_wakeup+0x2b/0xa0

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbaed7d15>] ?
ttwu_do_wakeup+0xb5/0xe0

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbb58615d>]
wait_for_completion+0xfd/0x140

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbaedb990>] ?
wake_up_state+0x20/0x20

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0f529a4>]
llog_process_or_fork+0x244/0x450 [obdclass]

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0f52bc4>]
llog_process+0x14/0x20 [obdclass]

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0f85d05>]
class_config_parse_llog+0x125/0x350 [obdclass]

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0a69fc0>]
mgc_process_cfg_log+0x790/0xc40 [mgc]

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0a6d4cc>]
mgc_process_log+0x3dc/0x8f0 [mgc]

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0a6e15f>] ?
config_recover_log_add+0x13f/0x280 [mgc]

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0f8df40>] ?
class_config_dump_handler+0x7e0/0x7e0 [obdclass]

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0a6eb2b>]
mgc_process_config+0x88b/0x13f0 [mgc]

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0f91b58>]
lustre_process_log+0x2d8/0xad0 [obdclass]

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0e5a177>] ?
libcfs_debug_msg+0x57/0x80 [libcfs]

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0f7c8b9>] ?
lprocfs_counter_add+0xf9/0x160 [obdclass]

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0fc08f4>]
server_start_targets+0x13a4/0x2a20 [obdclass]

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0f94bb0>] ?
lustre_start_mgc+0x260/0x2510 [obdclass]

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0f8df40>] ?
class_config_dump_handler+0x7e0/0x7e0 [obdclass]

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0fc303c>]
server_fill_super+0x10cc/0x1890 [obdclass]

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0f97a08>]
lustre_fill_super+0x468/0x960 [obdclass]

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0f975a0>] ?
lustre_common_put_super+0x270/0x270 [obdclass]

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbb0510cf>] mount_nodev+0x4f/0xb0

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0f8f9a8>]
lustre_mount+0x38/0x60 [obdclass]

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbb051c4e>] mount_fs+0x3e/0x1b0

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbb0707a7>]
vfs_kern_mount+0x67/0x110

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbb072edf>] do_mount+0x1ef/0xd00

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbb049d7a>] ?
__check_object_size+0x1ca/0x250

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbb0288ec>] ?
kmem_cache_alloc_trace+0x3c/0x200

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbb073d33>] SyS_mount+0x83/0xd0

Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbb592ed2>]
system_call_fastpath+0x25/0x2a

Jun 20 15:13:14 hyd-mds1 kernel: LNet:
4458:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Timed out tx for
172.16.100.4 at o2ib: 9 seconds

Jun 20 15:13:14 hyd-mds1 kernel: LNet:
4458:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Skipped 239 previous
similar messages

Jun 20 15:14:14 hyd-mds1 kernel: INFO: task mount.lustre:4123 blocked for
more than 120 seconds.

Jun 20 15:14:14 hyd-mds1 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.

Jun 20 15:14:14 hyd-mds1 kernel: mount.lustre    D ffff9f27a3bc5230
  0  4123      1 0x00000086

dumpe2fs seems to show that the file systems are clean i.e.

dumpe2fs 1.45.6.wc1 (20-Mar-2020)

Filesystem volume name:   hydra-MDT0000

Last mounted on:          /

Filesystem UUID:          3ae09231-7f2a-43b3-a4ee-7f36080b5a66

Filesystem magic number:  0xEF53

Filesystem revision #:    1 (dynamic)

Filesystem features:      has_journal ext_attr resize_inode dir_index
filetype mmp flex_bg dirdata sparse_super large_file huge_file uninit_bg
dir_nlink quota

Filesystem flags:         signed_directory_hash

Default mount options:    user_xattr acl

Filesystem state:         clean

Errors behavior:          Continue

Filesystem OS type:       Linux

Inode count:              2247671504

Block count:              1404931944

Reserved block count:     70246597

Free blocks:              807627552

Free inodes:              2100036536

First block:              0

Block size:               4096

Fragment size:            4096

Reserved GDT blocks:      1024

Blocks per group:         20472

Fragments per group:      20472

Inodes per group:         32752

Inode blocks per group:   8188

Flex block group size:    16

Filesystem created:       Thu Aug  8 14:21:01 2019

Last mount time:          Tue Jun 20 15:19:03 2023

Last write time:          Wed Jun 21 10:43:51 2023

Mount count:              38

Maximum mount count:      -1

Last checked:             Thu Aug  8 14:21:01 2019

Check interval:           0 (<none>)

Lifetime writes:          219 TB

Reserved blocks uid:      0 (user root)

Reserved blocks gid:      0 (group root)

First inode:              11

Inode size:           1024

Required extra isize:     32

Desired extra isize:      32

Journal inode:            8

Default directory hash:   half_md4

Directory Hash Seed:      2e518531-82d9-4652-9acd-9cf9ca09c399

Journal backup:           inode blocks

MMP block number:         1851467

MMP update interval:      5

User quota inode:         3

Group quota inode:        4

Journal features:         journal_incompat_revoke

Journal size:             4096M

Journal length:           1048576

Journal sequence:         0x0a280713

Journal start:            0

MMP_block:

    mmp_magic: 0x4d4d50

    mmp_check_interval: 6

    mmp_sequence: 0xff4d4d50

    mmp_update_date: Wed Jun 21 10:43:51 2023

    mmp_update_time: 1687358631

    mmp_node_name: hyd-mds1.uncc.edu

    mmp_device_name: dm-0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230621/b97436cd/attachment-0001.htm>


More information about the lustre-discuss mailing list