[lustre-discuss] MDT will not mount

Mon Mar 14 04:48:31 PDT 2022

I'm happy to that the problem seems to be solved by deleting the 
CATALOGS file on the underlying MDT ZFS fs. As I gather from the manual 
[1] this should not be a problem, because it will be handled by LFSCK.

If I'm wrong about this, please let me know. Also, I'm happy to provide 
any information from this MDT to help asses if there is a bug somewhere.

LFSCK is running as we speak.

Cheers,
Hans Henrik

[1] https://doc.lustre.org/lustre_manual.xhtml#backup_fs_level.restore

On 11.03.2022 12.49, Hans Henrik Happe via lustre-discuss wrote:
> I tried tunefs.lustre --erase-params --writeconf the targets. Guess it 
> is not great because the clients were not unmounted, but I made sure 
> they are not trying to connect.
>
> This makes it possible to mount the MDT, but when the first OST mount 
> starts the MDT has a lot of errors. After starting the second OST the 
> MDS crashes (syslog attached).
>
> Cheers,
> Hans Henrik
>
> On 10.03.2022 15.48, Hans Henrik Happe via lustre-discuss wrote:
>> Sorry for all the mail load, but I hope this info can help figuring 
>> out what's wrong and determine if this was caused by a bug. I think
>>
>> I read the CONFIGS on the MDT with llog_reader. See attachments.
>>
>> Cheers,
>> Hans Henrik
>>
>> On 10.03.2022 12.23, Hans Henrik Happe via lustre-discuss wrote:
>>> After upgrading to Lustre 2.12.8 I found that the first mount after 
>>> a reboot behaves differently:
>>>
>>> Mounting mds02/astro0 on /mnt/lustre/local/astro-MDT0000
>>> mount.lustre: mount mds02/astro0 at /mnt/lustre/local/astro-MDT0000 
>>> failed: No space left on device
>>>
>>> And a different syslog output (attached syslog-0).
>>>
>>> Doing the mount again has this error:
>>>
>>> Mounting mds02/astro0 on /mnt/lustre/local/astro-MDT0000
>>> mount.lustre: mount mds02/astro0 at /mnt/lustre/local/astro-MDT0000 
>>> failed: File exists
>>>
>>> And a syslog like the one first posted. Attached the new output in 
>>> syslog-1.
>>>
>>> Finally, stopping Lustre (Only MGS in this case) and the lnet 
>>> service does free resources making lustre_rmmod fail:
>>>
>>> # lustre_rmmod
>>> rmmod: ERROR: Module osp is in use
>>>
>>>
>>> Cheers,
>>> Hans Henrik
>>>
>>> On 10.03.2022 11.15, Hans Henrik Happe via lustre-discuss wrote:
>>>> Forgot to say this is Lustre 2.12.6 and CentOS 7.9 
>>>> (3.10.0-1160.6.1.el7.x86_64).
>>>>
>>>> On 10.03.2022 10.27, Hans Henrik Happe via lustre-discuss wrote:
>>>>> Hi,
>>>>>
>>>>> A reboot of the MDS stalled and got forced reset. After that the 
>>>>> MDS would not start. The syslog is attached.
>>>>>
>>>>> I'm not sure what the "class_register_device()) 
>>>>> astro-OST0002-osc-MDT0000" part is supposed to do but 
>>>>> astro-OST0002 is not mounted at this time. I guess this comes from 
>>>>> the MGS.
>>>>>
>>>>> Cheers,
>>>>> Hans Henrik
>>>>>
>>>
>>
>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20220314/e3e05f1e/attachment.html>