[lustre-discuss] [EXTERNAL] MDTs will only mount read only

Mike Mosley Mike.Mosley at charlotte.edu
Wed Jun 21 09:51:38 PDT 2023


Hi Rick,

The MGS/MDS are combined.   The output I posted is from the primary.

THanks,

Mike

On Wed, Jun 21, 2023 at 12:27 PM Mohr, Rick <mohrrf at ornl.gov> wrote:

> Mike,
>
> It looks like the mds server is having a problem contacting the mgs
> server.  I'm guessing the mgs is a separate host?  I would start by looking
> for possible network problems that might explain the LNet timeouts.  You
> can try using "lctl ping" to test the LNet connection between nodes, and
> you can also try regular "ping" between the IP addresses on the IB
> interfaces.
>
> --Rick
>
>
> On 6/21/23, 11:35 AM, "lustre-discuss on behalf of Mike Mosley via
> lustre-discuss" <lustre-discuss-bounces at lists.lustre.org <mailto:
> lustre-discuss-bounces at lists.lustre.org> on behalf of
> lustre-discuss at lists.lustre.org <mailto:lustre-discuss at lists.lustre.org>>
> wrote:
>
>
> Greetings,
>
>
> We have experienced some type of issue that is causing both of our MDS
> servers to only be able to mount the mdt device in read only mode. Here are
> some of the error messages we are seeing in the log files below. We lost
> our Lustre expert a while back and we are not sure how to proceed to
> troubleshoot this issue. Can anybody provide us guidance on how to proceed?
>
>
>
>
> Thanks,
>
>
>
>
> Mike
>
>
>
>
> Jun 20 15:12:14 hyd-mds1 kernel: INFO: task mount.lustre:4123 blocked for
> more than 120 seconds.
> Jun 20 15:12:14 hyd-mds1 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Jun 20 15:12:14 hyd-mds1 kernel: mount.lustre D ffff9f27a3bc5230 0 4123 1
> 0x00000086
> Jun 20 15:12:14 hyd-mds1 kernel: Call Trace:
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbb585da9>] schedule+0x29/0x70
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbb5838b1>]
> schedule_timeout+0x221/0x2d0
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbaf6b8e5>] ?
> tracing_is_on+0x15/0x30
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbaf6f5bd>] ?
> tracing_record_cmdline+0x1d/0x120
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbaf77d9b>] ?
> probe_sched_wakeup+0x2b/0xa0
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbaed7d15>] ?
> ttwu_do_wakeup+0xb5/0xe0
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbb58615d>]
> wait_for_completion+0xfd/0x140
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbaedb990>] ?
> wake_up_state+0x20/0x20
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0f529a4>]
> llog_process_or_fork+0x244/0x450 [obdclass]
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0f52bc4>]
> llog_process+0x14/0x20 [obdclass]
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0f85d05>]
> class_config_parse_llog+0x125/0x350 [obdclass]
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0a69fc0>]
> mgc_process_cfg_log+0x790/0xc40 [mgc]
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0a6d4cc>]
> mgc_process_log+0x3dc/0x8f0 [mgc]
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0a6e15f>] ?
> config_recover_log_add+0x13f/0x280 [mgc]
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0f8df40>] ?
> class_config_dump_handler+0x7e0/0x7e0 [obdclass]
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0a6eb2b>]
> mgc_process_config+0x88b/0x13f0 [mgc]
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0f91b58>]
> lustre_process_log+0x2d8/0xad0 [obdclass]
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0e5a177>] ?
> libcfs_debug_msg+0x57/0x80 [libcfs]
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0f7c8b9>] ?
> lprocfs_counter_add+0xf9/0x160 [obdclass]
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0fc08f4>]
> server_start_targets+0x13a4/0x2a20 [obdclass]
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0f94bb0>] ?
> lustre_start_mgc+0x260/0x2510 [obdclass]
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0f8df40>] ?
> class_config_dump_handler+0x7e0/0x7e0 [obdclass]
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0fc303c>]
> server_fill_super+0x10cc/0x1890 [obdclass]
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0f97a08>]
> lustre_fill_super+0x468/0x960 [obdclass]
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0f975a0>] ?
> lustre_common_put_super+0x270/0x270 [obdclass]
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbb0510cf>] mount_nodev+0x4f/0xb0
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffc0f8f9a8>]
> lustre_mount+0x38/0x60 [obdclass]
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbb051c4e>] mount_fs+0x3e/0x1b0
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbb0707a7>]
> vfs_kern_mount+0x67/0x110
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbb072edf>] do_mount+0x1ef/0xd00
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbb049d7a>] ?
> __check_object_size+0x1ca/0x250
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbb0288ec>] ?
> kmem_cache_alloc_trace+0x3c/0x200
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbb073d33>] SyS_mount+0x83/0xd0
> Jun 20 15:12:14 hyd-mds1 kernel: [<ffffffffbb592ed2>]
> system_call_fastpath+0x25/0x2a
> Jun 20 15:13:14 hyd-mds1 kernel: LNet:
> 4458:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Timed out tx for
> 172.16.100.4 at o2ib: 9 seconds
> Jun 20 15:13:14 hyd-mds1 kernel: LNet:
> 4458:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Skipped 239 previous
> similar messages
> Jun 20 15:14:14 hyd-mds1 kernel: INFO: task mount.lustre:4123 blocked for
> more than 120 seconds.
> Jun 20 15:14:14 hyd-mds1 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Jun 20 15:14:14 hyd-mds1 kernel: mount.lustre D ffff9f27a3bc5230 0 4123 1
> 0x00000086
>
>
>
>
>
>
> dumpe2fs seems to show that the file systems are clean i.e.
>
>
>
>
> dumpe2fs 1.45.6.wc1 (20-Mar-2020)
> Filesystem volume name: hydra-MDT0000
> Last mounted on: /
> Filesystem UUID: 3ae09231-7f2a-43b3-a4ee-7f36080b5a66
> Filesystem magic number: 0xEF53
> Filesystem revision #: 1 (dynamic)
> Filesystem features: has_journal ext_attr resize_inode dir_index filetype
> mmp flex_bg dirdata sparse_super large_file huge_file uninit_bg dir_nlink
> quota
> Filesystem flags: signed_directory_hash
> Default mount options: user_xattr acl
> Filesystem state: clean
> Errors behavior: Continue
> Filesystem OS type: Linux
> Inode count: 2247671504
> Block count: 1404931944
> Reserved block count: 70246597
> Free blocks: 807627552
> Free inodes: 2100036536
> First block: 0
> Block size: 4096
> Fragment size: 4096
> Reserved GDT blocks: 1024
> Blocks per group: 20472
> Fragments per group: 20472
> Inodes per group: 32752
> Inode blocks per group: 8188
> Flex block group size: 16
> Filesystem created: Thu Aug 8 14:21:01 2019
> Last mount time: Tue Jun 20 15:19:03 2023
> Last write time: Wed Jun 21 10:43:51 2023
> Mount count: 38
> Maximum mount count: -1
> Last checked: Thu Aug 8 14:21:01 2019
> Check interval: 0 (<none>)
> Lifetime writes: 219 TB
> Reserved blocks uid: 0 (user root)
> Reserved blocks gid: 0 (group root)
> First inode: 11
> Inode size: 1024
> Required extra isize: 32
> Desired extra isize: 32
> Journal inode: 8
> Default directory hash: half_md4
> Directory Hash Seed: 2e518531-82d9-4652-9acd-9cf9ca09c399
> Journal backup: inode blocks
> MMP block number: 1851467
> MMP update interval: 5
> User quota inode: 3
> Group quota inode: 4
> Journal features: journal_incompat_revoke
> Journal size: 4096M
> Journal length: 1048576
> Journal sequence: 0x0a280713
> Journal start: 0
> MMP_block:
> mmp_magic: 0x4d4d50
> mmp_check_interval: 6
> mmp_sequence: 0xff4d4d50
> mmp_update_date: Wed Jun 21 10:43:51 2023
> mmp_update_time: 1687358631
> mmp_node_name: hyd-mds1.uncc.edu <_blank>
> mmp_device_name: dm-0
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230621/f199396c/attachment.htm>


More information about the lustre-discuss mailing list