[lustre-discuss] Unable to mount new OST

David Cohen cdavid at physics.technion.ac.il
Mon Jul 5 22:34:40 PDT 2021


Thanks Andreas,
I'm aware that index 51 actually translates to hex 33 (local-OST0033_UUID).
I don't believe that's the reason for the failed mount as it is only an
index that I increase for every new OST and there are no duplicates.

lctl dk show tens of thousands of lines repeating the same error after
attempting to mount the OST:

00100000:10000000:26.0:1625546374.322973:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one())
local-OST0033: fail to set LMA for init OI scrub: rc = -30
00100000:10000000:26.0:1625546374.322974:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one())
local-OST0033: fail to set LMA for init OI scrub: rc = -30
00100000:10000000:26.0:1625546374.322975:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one())
local-OST0033: fail to set LMA for init OI scrub: rc = -30

in /var/log/messages I see the following corresponding to dm21 which is the
new OST:

Jul  6 07:38:37 oss03 kernel: LDISKFS-fs warning (device dm-21):
ldiskfs_multi_mount_protect:322: MMP interval 42 higher than expected,
please wait.
Jul  6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): file extents enabled,
maximum tree depth=5
Jul  6 07:39:19 oss03 kernel: LDISKFS-fs warning (device dm-21):
ldiskfs_clear_journal_err:4862: Filesystem error recorded from previous
mount: IO failure
Jul  6 07:39:19 oss03 kernel: LDISKFS-fs warning (device dm-21):
ldiskfs_clear_journal_err:4863: Marking fs in need of filesystem check.
Jul  6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): warning: mounting fs with
errors, running e2fsck is recommended
Jul  6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): recovery complete
Jul  6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): mounted filesystem with
ordered data mode. Opts:
user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc
Jul  6 07:39:22 oss03 kernel: LDISKFS-fs error (device dm-21):
htree_dirblock_to_tree:1278: inode #2: block 21233: comm mount.lustre: bad
entry in directory: rec_len is too small for name_len - offset=4084(4084),
inode=0, rec_len=12
, name_len=0
Jul  6 07:39:22 oss03 kernel: Aborting journal on device dm-21-8.
Jul  6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): Remounting filesystem
read-only
Jul  6 07:39:24 oss03 kernel: LDISKFS-fs warning (device dm-21): kmmpd:187:
kmmpd being stopped since filesystem has been remounted as readonly.
Jul  6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): error count since last
fsck: 6
Jul  6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): initial error at time
1625367384: htree_dirblock_to_tree:1278: inode 2: block 21233
Jul  6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): last error at time
1625546362: htree_dirblock_to_tree:1278: inode 2: block 21233

As I mentioned before mount never completes so the only way out of that is
force reboot.

Thanks,
David

On Tue, Jul 6, 2021 at 8:07 AM Andreas Dilger <adilger at whamcloud.com> wrote:

>
>
> On Jul 5, 2021, at 09:05, David Cohen <cdavid at physics.technion.ac.il>
> wrote:
>
> Hi,
> I'm using Lustre 2.10.5 and lately tried to add a new OST.
> The OST was formatted with the command below, which other than the index
> is the exact same one used for all the other OSTs in the system.
>
> mkfs.lustre --reformat --mkfsoptions="-t ext4 -T huge" --ost
> --fsname=local  --index=0051 --param ost.quota_type=ug
> --mountfsoptions='errors=remount-ro,extents,mballoc' --mgsnode=10.0.0.3 at tcp
> --mgsnode=10.0.0.1 at tc
> p --mgsnode=10.0.0.2 at tcp --servicenode=10.0.0.3 at tcp
> --servicenode=10.0.0.1 at tcp --servicenode=10.0.0.2 at tcp /dev/mapper/OST0051
>
>
> Note that your "--index=0051" value is probably interpreted as an octal
> number "41", it should be "--index=0x0051" or "--index=0x51" (hex, to match
> the OST device name) or "--index=81" (decimal).
>
>
> When trying to mount the with:
> mount.lustre /dev/mapper/OST0051 /Lustre/OST0051
>
> The system stays on 100% CPU (one core) forever and the mount never
> completes, not even after a week.
>
> I tried tunefs.lustre --writeconf --erase-params on the MDS and all the
> other targets, but the behaviour remains the same.
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Whamcloud
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20210706/02aa6239/attachment.html>


More information about the lustre-discuss mailing list