<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
Hi,<br>
<br>
We had a crash with this in MDS log:<br>
<br>
Sep 22 13:45:07 sci-mds01 kernel: LustreError:
258240:0:(osd_handler.c:354:osd_trans_create()) 03781251-MDT0000:
someone try to start transaction under readonly mode, should be
disabled.<br>
Sep 22 13:45:07 sci-mds01 kernel: CPU: 31 PID: 94594 Comm:
mdt_rdpg05_005 Kdump: loaded Tainted: P OE ------------
3.10.0-1160.6.1.el7.x86_64 #1<br>
Sep 22 13:45:07 sci-mds01 kernel: Hardware name: Dell Inc. PowerEdge
R640/0HG0J8, BIOS 2.10.2 02/24/2021<br>
Sep 22 13:45:07 sci-mds01 kernel: Call Trace:<br>
Sep 22 13:45:07 sci-mds01 kernel: [<ffffffff89f81400>]
dump_stack+0x19/0x1b<br>
Sep 22 13:45:07 sci-mds01 kernel: [<ffffffffc143e64a>]
osd_trans_create+0x3ca/0x410 [osd_zfs]<br>
Sep 22 13:45:07 sci-mds01 kernel: CPU: 10 PID: 258241 Comm:
mdt_rdpg05_001 Kdump: loaded Tainted: P OE ------------
3.10.0-1160.6.1.el7.x86_64 #1<br>
Sep 22 13:45:07 sci-mds01 kernel: [<ffffffffc12d885a>]
top_trans_create+0x8a/0x200 [ptlrpc]<br>
Sep 22 13:45:07 sci-mds01 kernel: Hardware name: Dell Inc. PowerEdge
R640/0HG0J8, BIOS 2.10.2 02/24/2021<br>
Sep 22 13:45:07 sci-mds01 kernel: [<ffffffffc16284dc>]
lod_trans_create+0x3c/0x50 [lod]<br>
....<br>
<br>
Looks similar to this:
<a class="moz-txt-link-freetext" href="http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2018-August/015854.html">http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2018-August/015854.html</a><br>
<br>
When restarting, the MGS starts fine, but the one MDT
(science-MDT0000) does not:<br>
<br>
Sep 23 16:10:17 sci-mds00 kernel: Lustre: MGS: Connection restored
to 0dd6cfa0-bdf7-c8ac-7bb9-182f7874e165 (at 0@lo)<br>
Sep 23 16:10:17 sci-mds00 kernel: Lustre: Skipped 1 previous similar
message<br>
Sep 23 16:10:19 sci-mds00 kernel: Lustre:
52424:0:(llog_cat.c:93:llog_cat_new_log())
science-OST1100-osc-MDT0000: there are no more free slots in catalog
[0x2:0x1:0x0]:0<br>
Sep 23 16:10:19 sci-mds00 kernel: LustreError:
52424:0:(osp_sync.c:1524:osp_sync_init())
science-OST1100-osc-MDT0000: can't initialize llog: rc = -28<br>
Sep 23 16:10:19 sci-mds00 kernel: LustreError:
52424:0:(obd_config.c:559:class_setup()) setup
science-OST1100-osc-MDT0000 failed (-28)<br>
Sep 23 16:10:19 sci-mds00 kernel: LustreError:
52424:0:(obd_config.c:1835:class_config_llog_handler())
MGC10.120.10.90@tcp: cfg command failed: rc = -28<br>
Sep 23 16:10:19 sci-mds00 kernel: Lustre: cmd=cf003
0:science-OST1100-osc-MDT0000 1:science-OST1100_UUID
2:10.120.10.110@tcp <br>
Sep 23 16:10:19 sci-mds00 kernel: LustreError: 15c-8:
MGC10.120.10.90@tcp: The configuration from log 'science-MDT0000'
failed (-28). This may be the result of communication errors between
this node and the MGS, a bad configuration, or other errors. Set.<br>
Sep 23 16:10:19 sci-mds00 kernel: LustreError:
52172:0:(obd_mount_server.c:1397:server_start_targets()) failed to
start server science-MDT0000: -28<br>
Sep 23 16:10:19 sci-mds00 kernel: LustreError:
52172:0:(obd_mount_server.c:1992:server_fill_super()) Unable to
start targets: -28<br>
Sep 23 16:10:19 sci-mds00 kernel: Lustre: Failing over
science-MDT0000<br>
Sep 23 16:10:19 sci-mds00 kernel: Lustre: server umount
science-MDT0000 complete<br>
Sep 23 16:10:19 sci-mds00 kernel: LustreError:
52172:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount
(-28)<br>
<br>
<br>
We have tried to --writeconf it, but that only moves the problem to
this error when mounting an OST:<br>
<br>
Sep 23 12:04:16 sci-mds00 kernel: Lustre: MGS: Logs for fs science
were removed by user request. All servers must be restarted in
order to regenerate the logs: rc = 0<br>
Sep 23 12:04:16 sci-mds00 kernel: Lustre: science-MDT0000:
Imperative Recovery not enabled, recovery window 300-900<br>
Sep 23 12:04:38 sci-mds00 kernel: Lustre: MGS: Connection restored
to 68b4cd3a-6c73-19c5-2925-935e42bdaf2b (at 10.120.10.111@tcp)<br>
Sep 23 12:04:38 sci-mds00 kernel: Lustre: Skipped 2 previous similar
messages<br>
Sep 23 12:04:38 sci-mds00 kernel: Lustre: MGS: Regenerating
science-OST1100 log by user request: rc = 0<br>
Sep 23 12:04:45 sci-mds00 kernel: LustreError:
5547:0:(genops.c:556:class_register_device())
science-OST1100-osc-MDT0000: already exists, won't add<br>
Sep 23 12:04:45 sci-mds00 kernel: LustreError:
5547:0:(obd_config.c:1835:class_config_llog_handler())
MGC10.120.10.90@tcp: cfg command failed: rc = -17<br>
Sep 23 12:04:45 sci-mds00 kernel: Lustre: cmd=cf001
0:science-OST1100-osc-MDT0000 1:osp 2:science-MDT0000-mdtlov_UUID
<br>
Sep 23 12:04:45 sci-mds00 kernel: LustreError:
1345:0:(mgc_request.c:599:do_requeue()) failed processing log: -17<br>
<br>
Any ideas how to solve this.<br>
<br>
Cheers,<br>
Hans Henrik<br>
</body>
</html>