[lustre-discuss] MDT will not mount

Hans Henrik Happe happe at nbi.dk
Fri Mar 11 03:49:55 PST 2022


I tried tunefs.lustre --erase-params --writeconf the targets. Guess it 
is not great because the clients were not unmounted, but I made sure 
they are not trying to connect.

This makes it possible to mount the MDT, but when the first OST mount 
starts the MDT has a lot of errors. After starting the second OST the 
MDS crashes (syslog attached).

Cheers,
Hans Henrik

On 10.03.2022 15.48, Hans Henrik Happe via lustre-discuss wrote:
> Sorry for all the mail load, but I hope this info can help figuring 
> out what's wrong and determine if this was caused by a bug. I think
>
> I read the CONFIGS on the MDT with llog_reader. See attachments.
>
> Cheers,
> Hans Henrik
>
> On 10.03.2022 12.23, Hans Henrik Happe via lustre-discuss wrote:
>> After upgrading to Lustre 2.12.8 I found that the first mount after a 
>> reboot behaves differently:
>>
>> Mounting mds02/astro0 on /mnt/lustre/local/astro-MDT0000
>> mount.lustre: mount mds02/astro0 at /mnt/lustre/local/astro-MDT0000 
>> failed: No space left on device
>>
>> And a different syslog output (attached syslog-0).
>>
>> Doing the mount again has this error:
>>
>> Mounting mds02/astro0 on /mnt/lustre/local/astro-MDT0000
>> mount.lustre: mount mds02/astro0 at /mnt/lustre/local/astro-MDT0000 
>> failed: File exists
>>
>> And a syslog like the one first posted. Attached the new output in 
>> syslog-1.
>>
>> Finally, stopping Lustre (Only MGS in this case) and the lnet service 
>> does free resources making lustre_rmmod fail:
>>
>> # lustre_rmmod
>> rmmod: ERROR: Module osp is in use
>>
>>
>> Cheers,
>> Hans Henrik
>>
>> On 10.03.2022 11.15, Hans Henrik Happe via lustre-discuss wrote:
>>> Forgot to say this is Lustre 2.12.6 and CentOS 7.9 
>>> (3.10.0-1160.6.1.el7.x86_64).
>>>
>>> On 10.03.2022 10.27, Hans Henrik Happe via lustre-discuss wrote:
>>>> Hi,
>>>>
>>>> A reboot of the MDS stalled and got forced reset. After that the 
>>>> MDS would not start. The syslog is attached.
>>>>
>>>> I'm not sure what the "class_register_device()) 
>>>> astro-OST0002-osc-MDT0000" part is supposed to do but astro-OST0002 
>>>> is not mounted at this time. I guess this comes from the MGS.
>>>>
>>>> Cheers,
>>>> Hans Henrik
>>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20220311/68d9d3cf/attachment.html>
-------------- next part --------------
Mar 11 12:42:04 mds02 kernel: Lustre: MGS: Logs for fs astro were removed by user request.  All servers must be restarted in order to regenerate the logs: rc = 0
Mar 11 12:42:04 mds02 kernel: Lustre: astro-MDT0000: nosquash_nids set to 172.20.1.10 at tcp1
Mar 11 12:42:04 mds02 kernel: Lustre: astro-MDT0000: Imperative Recovery not enabled, recovery window 300-900
Mar 11 12:42:29 mds02 kernel: Lustre: astro-MDT0000: Connection restored to 0d2c198e-514c-3ae5-fc31-48e0424f131d (at 0 at lo)
Mar 11 12:42:46 mds02 systemd: Started Session c4 of user root.
Mar 11 12:42:51 mds02 kernel: Lustre: MGS: Connection restored to b11aa8af-1dd3-d728-0e81-6f595456b689 (at 10.21.10.114 at o2ib)
Mar 11 12:42:51 mds02 kernel: Lustre: MGS: Regenerating astro-OST0000 log by user request: rc = 0
Mar 11 12:42:58 mds02 kernel: Lustre: 10971:0:(llog_cat.c:93:llog_cat_new_log()) astro-OST0000-osc-MDT0000: there are no more free slots in catalog [0x186:0x1:0x0]:0
Mar 11 12:42:58 mds02 kernel: LustreError: 10971:0:(osp_sync.c:1524:osp_sync_init()) astro-OST0000-osc-MDT0000: can't initialize llog: rc = -28
Mar 11 12:42:58 mds02 kernel: LustreError: 10971:0:(obd_config.c:559:class_setup()) setup astro-OST0000-osc-MDT0000 failed (-28)
Mar 11 12:42:58 mds02 kernel: LustreError: 10971:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.21.10.102 at o2ib: cfg command failed: rc = -28
Mar 11 12:42:58 mds02 kernel: Lustre:    cmd=cf003 0:astro-OST0000-osc-MDT0000  1:astro-OST0000_UUID  2:10.21.10.114 at o2ib  
Mar 11 12:42:58 mds02 kernel: LustreError: 9282:0:(mgc_request.c:599:do_requeue()) failed processing log: -28
Mar 11 12:44:16 mds02 kernel: Lustre: MGS: Connection restored to 9842fe3a-0ff5-afc6-292f-cff60a4897ba (at 10.21.10.115 at o2ib)
Mar 11 12:44:16 mds02 kernel: Lustre: Skipped 1 previous similar message
Mar 11 12:44:16 mds02 kernel: Lustre: MGS: Regenerating astro-OST0001 log by user request: rc = 0
Mar 11 12:44:25 mds02 kernel: LustreError: 11466:0:(obd_config.c:764:class_add_conn()) try to add conn on immature client dev

Message from syslogd at mds02 at Mar 11 12:44:25 ...
 kernel:LustreError: 11466:0:(lod_lov.c:244:lod_add_device()) ASSERTION( obd->obd_lu_dev->ld_site == lod->lod_dt_dev.dd_lu_dev.ld_site ) failed: 
Mar 11 12:44:25 mds02 kernel: LustreError: 11466:0:(lod_lov.c:244:lod_add_device()) ASSERTION( obd->obd_lu_dev->ld_site == lod->lod_dt_dev.dd_lu_dev.ld_site ) failed: 

Message from syslogd at mds02 at Mar 11 12:44:25 ...
 kernel:LustreError: 11466:0:(lod_lov.c:244:lod_add_device()) LBUG
Mar 11 12:44:25 mds02 kernel: LustreError: 11466:0:(lod_lov.c:244:lod_add_device()) LBUG
Mar 11 12:44:25 mds02 kernel: Pid: 11466, comm: llog_process_th 3.10.0-1160.45.1.el7.x86_64 #1 SMP Wed Oct 13 17:20:51 UTC 2021
Mar 11 12:44:25 mds02 kernel: Call Trace:
Mar 11 12:44:25 mds02 kernel: [<ffffffffc095a7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
Mar 11 12:44:25 mds02 kernel: [<ffffffffc095a87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
Mar 11 12:44:25 mds02 kernel: [<ffffffffc0ec0f1a>] lod_add_device+0x195a/0x19a0 [lod]
Mar 11 12:44:25 mds02 kernel: [<ffffffffc0ebb895>] lod_process_config+0x13b5/0x1510 [lod]
Mar 11 12:44:25 mds02 kernel: [<ffffffffc13eeaf2>] class_process_config+0x2142/0x2830 [obdclass]
Mar 11 12:44:25 mds02 kernel: [<ffffffffc13f0db9>] class_config_llog_handler+0x819/0x1520 [obdclass]
Mar 11 12:44:25 mds02 kernel: [<ffffffffc13b37d4>] llog_process_thread+0x8e4/0x19c0 [obdclass]
Mar 11 12:44:25 mds02 kernel: [<ffffffffc13b52c4>] llog_process_thread_daemonize+0xa4/0xe0 [obdclass]
Mar 11 12:44:25 mds02 kernel: [<ffffffff820c5e61>] kthread+0xd1/0xe0
Mar 11 12:44:25 mds02 kernel: [<ffffffff82795ddd>] ret_from_fork_nospec_begin+0x7/0x21
Mar 11 12:44:25 mds02 kernel: [<ffffffffffffffff>] 0xffffffffffffffff



More information about the lustre-discuss mailing list