[lustre-discuss] Unable to mount new OST

Tue Jul 6 21:24:54 PDT 2021

What devices are underneath dm-21 and are there any errors in
/var/log/messages for those devices? (assuming /dev/sdX devices underneath)

Run `ls /sys/block/dm-21/slaves` to see what devices are beneath dm-21

On Tue, Jul 6, 2021 at 20:09 David Cohen <cdavid at physics.technion.ac.il>
wrote:

> Hi,
> The index of the OST is unique in the system and free for the new one, as
> it is increased by "1" for every new OST created, so whatever it converts
> to should not be relevant to it's refusal to mount, or am I mistaken?
>
> I'm pasting the log messages again, in case they were lost up the thread,
> adding the output of "fdisk -l", should the OST size be the issue:
>
> lctl dk show tens of thousands of lines repeating the same error after
> attempting to mount the OST:
>
> 00100000:10000000:26.0:1625546374.322973:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one())
> local-OST0033: fail to set LMA for init OI scrub: rc = -30
> 00100000:10000000:26.0:1625546374.322974:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one())
> local-OST0033: fail to set LMA for init OI scrub: rc = -30
> 00100000:10000000:26.0:1625546374.322975:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one())
> local-OST0033: fail to set LMA for init OI scrub: rc = -30
>
> in /var/log/messages I see the following corresponding to dm21 which is
> the new OST:
>
> Jul  6 07:38:37 oss03 kernel: LDISKFS-fs warning (device dm-21):
> ldiskfs_multi_mount_protect:322: MMP interval 42 higher than expected,
> please wait.
> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): file extents enabled,
> maximum tree depth=5
> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs warning (device dm-21):
> ldiskfs_clear_journal_err:4862: Filesystem error recorded from previous
> mount: IO failure
> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs warning (device dm-21):
> ldiskfs_clear_journal_err:4863: Marking fs in need of filesystem check.
> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): warning: mounting fs
> with errors, running e2fsck is recommended
> Jul  6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): recovery complete
> Jul  6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): mounted filesystem with
> ordered data mode. Opts:
> user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc
> Jul  6 07:39:22 oss03 kernel: LDISKFS-fs error (device dm-21):
> htree_dirblock_to_tree:1278: inode #2: block 21233: comm mount.lustre: bad
> entry in directory: rec_len is too small for name_len - offset=4084(4084),
> inode=0, rec_len=12
> , name_len=0
> Jul  6 07:39:22 oss03 kernel: Aborting journal on device dm-21-8.
> Jul  6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): Remounting filesystem
> read-only
> Jul  6 07:39:24 oss03 kernel: LDISKFS-fs warning (device dm-21):
> kmmpd:187: kmmpd being stopped since filesystem has been remounted as
> readonly.
> Jul  6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): error count since last
> fsck: 6
> Jul  6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): initial error at time
> 1625367384: htree_dirblock_to_tree:1278: inode 2: block 21233
> Jul  6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): last error at time
> 1625546362: htree_dirblock_to_tree:1278: inode 2: block 21233
>
> fdisk -l /dev/mapper/OST0051
>
> Disk /dev/mapper/OST0051: 142799.1 GB, 142799072657408 bytes, 34863054848
> sectors
> Units = sectors of 1 * 4096 = 4096 bytes
> Sector size (logical/physical): 4096 bytes / 4096 bytes
> I/O size (minimum/optimal): 2097152 bytes / 2097152 bytes
>
>
> Thanks,
> David
>
> On Tue, Jul 6, 2021 at 10:35 PM Spitz, Cory James <cory.spitz at hpe.com>
> wrote:
>
>> What OST index (number) were you trying to add?
>>
>>
>>
>> Andreas is right:
>>
>> Note that your "--index=0051" value is probably interpreted as an octal
>> number "41", it should be "--index=0x0051" or "--index=0x51" (hex, to match
>> the OST device name) or "--index=81" (decimal).
>>
>>
>>
>> And you said:
>>
>> I'm aware that index 51 actually translates to hex 33
>> (local-OST0033_UUID).
>>
>>
>>
>> Ok, 0051 (in octal by way of the leading zeros*) translates to decimal 41
>> as Andreas pointed out, but that’s 0x29 in hexadecimal, not 0x33.  Assuming
>> you wanted to use decimal 51 then you’d have tried to mkfs.lustre the wrong
>> index.  So, if you wanted to use decimal 51, you’d have use –index=0x33 or
>> –index=0063.
>>
>>
>>
>> -Cory
>>
>>
>>
>> p.s.
>>
>> (*) BTW, the convention with leading zeros for octal can be googled or
>> read about at https://en.wikipedia.org/wiki/Octal.
>>
>>
>>
>>
>>
>> On 7/6/21, 12:35 AM, "lustre-discuss on behalf of David Cohen" <
>> lustre-discuss-bounces at lists.lustre.org on behalf of
>> cdavid at physics.technion.ac.il> wrote:
>>
>>
>>
>> Thanks Andreas,
>>
>> I'm aware that index 51 actually translates to hex 33
>> (local-OST0033_UUID).
>> I don't believe that's the reason for the failed mount as it is only an
>> index that I increase for every new OST and there are no duplicates.
>>
>>
>>
>> lctl dk show tens of thousands of lines repeating the same error after
>> attempting to mount the OST:
>>
>>
>>
>> 00100000:10000000:26.0:1625546374.322973:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one())
>> local-OST0033: fail to set LMA for init OI scrub: rc = -30
>>
>> 00100000:10000000:26.0:1625546374.322974:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one())
>> local-OST0033: fail to set LMA for init OI scrub: rc = -30
>>
>> 00100000:10000000:26.0:1625546374.322975:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one())
>> local-OST0033: fail to set LMA for init OI scrub: rc = -30
>>
>>
>>
>> in /var/log/messages I see the following corresponding to dm21 which is
>> the new OST:
>>
>>
>>
>> Jul  6 07:38:37 oss03 kernel: LDISKFS-fs warning (device dm-21):
>> ldiskfs_multi_mount_protect:322: MMP interval 42 higher than expected,
>> please wait.
>>
>> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): file extents enabled,
>> maximum tree depth=5
>> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs warning (device dm-21):
>> ldiskfs_clear_journal_err:4862: Filesystem error recorded from previous
>> mount: IO failure
>> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs warning (device dm-21):
>> ldiskfs_clear_journal_err:4863: Marking fs in need of filesystem check.
>> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): warning: mounting fs
>> with errors, running e2fsck is recommended
>> Jul  6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): recovery complete
>> Jul  6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): mounted filesystem with
>> ordered data mode. Opts:
>> user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc
>> Jul  6 07:39:22 oss03 kernel: LDISKFS-fs error (device dm-21):
>> htree_dirblock_to_tree:1278: inode #2: block 21233: comm mount.lustre: bad
>> entry in directory: rec_len is too small for name_len - offset=4084(4084),
>> inode=0, rec_len=12
>> , name_len=0
>> Jul  6 07:39:22 oss03 kernel: Aborting journal on device dm-21-8.
>> Jul  6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): Remounting filesystem
>> read-only
>> Jul  6 07:39:24 oss03 kernel: LDISKFS-fs warning (device dm-21):
>> kmmpd:187: kmmpd being stopped since filesystem has been remounted as
>> readonly.
>> Jul  6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): error count since last
>> fsck: 6
>> Jul  6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): initial error at time
>> 1625367384: htree_dirblock_to_tree:1278: inode 2: block 21233
>> Jul  6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): last error at time
>> 1625546362: htree_dirblock_to_tree:1278: inode 2: block 21233
>>
>> As I mentioned before mount never completes so the only way out of that
>> is force reboot.
>>
>> Thanks,
>> David
>>
>>
>>
>> On Tue, Jul 6, 2021 at 8:07 AM Andreas Dilger <adilger at whamcloud.com>
>> wrote:
>>
>>
>>
>>
>>
>> On Jul 5, 2021, at 09:05, David Cohen <cdavid at physics.technion.ac.il>
>> wrote:
>>
>>
>>
>> Hi,
>>
>> I'm using Lustre 2.10.5 and lately tried to add a new OST.
>>
>> The OST was formatted with the command below, which other than the index
>> is the exact same one used for all the other OSTs in the system.
>>
>>
>>
>> mkfs.lustre --reformat --mkfsoptions="-t ext4 -T huge" --ost
>> --fsname=local  --index=0051 --param ost.quota_type=ug
>> --mountfsoptions='errors=remount-ro,extents,mballoc' --mgsnode=10.0.0.3 at tcp
>> --mgsnode=10.0.0.1 at tc
>>
>> p --mgsnode=10.0.0.2 at tcp --servicenode=10.0.0.3 at tcp
>> --servicenode=10.0.0.1 at tcp --servicenode=10.0.0.2 at tcp /dev/mapper/OST0051
>>
>>
>>
>> Note that your "--index=0051" value is probably interpreted as an octal
>> number "41", it should be "--index=0x0051" or "--index=0x51" (hex, to match
>> the OST device name) or "--index=81" (decimal).
>>
>>
>>
>>
>>
>> When trying to mount the with:
>> mount.lustre /dev/mapper/OST0051 /Lustre/OST0051
>>
>>
>>
>> The system stays on 100% CPU (one core) forever and the mount never
>> completes, not even after a week.
>>
>>
>> I tried tunefs.lustre --writeconf --erase-params on the MDS and all the
>> other targets, but the behaviour remains the same.
>>
>>
>>
>> Cheers, Andreas
>>
>> --
>>
>> Andreas Dilger
>>
>> Lustre Principal Architect
>>
>> Whamcloud
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-- 
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing

jeff.johnson at aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20210706/a12a4d49/attachment-0001.html>