[lustre-discuss] Lustre/ZFS snapshots mount error
Robert Redl
robert.redl at lmu.de
Tue Sep 11 03:53:56 PDT 2018
Thanks for the fast reply! If I understood correctly, it is currently
not possible to use the changelog feature together with the snapshot
feature, right?
Is there already a LU-Ticket about that?
Cheers,
Robert
On 09/10/2018 02:57 PM, Yong, Fan wrote:
>
> It is suspected that there were some llog to be handled when the
> snapshot was making Then when mount-up such snapshot, some conditions
> trigger the llog cleanup/modification automatically. So it is not
> related with your actions when mount the snapshot. Since we cannot
> control the system status when making the snapshot, then we have to
> skip llog related cleanup/modification against the snapshot when mount
> the snapshot. Such “skip” related logic is just what we need.
>
>
>
> Cheers,
>
> Nasf
>
> *From:*lustre-discuss [mailto:lustre-discuss-bounces at lists.lustre.org]
> *On Behalf Of * Robert Redl
> *Sent:* Saturday, September 8, 2018 9:04 PM
> *To:* lustre-discuss at lists.lustre.org
> *Subject:* Re: [lustre-discuss] Lustre/ZFS snapshots mount error
>
>
>
> Dear All,
>
> we have a similar setup with Lustre on ZFS and we make regular use of
> snapshots for the purpose of backups (backups on tape use snapshots as
> source). We would like to use robinhood in future and the question is
> now how to do it.
>
> Would it be a workaround to disable the robinhood daemon temporary
> during the mount process?
> Does the problem only occur when changelogs are consumed during the
> process of mounting a snapshot? Or is it also a problem when
> changelogs are consumed while the snapshot remains mounted (which is
> for us typically several hours)?
> Is there already an LU-ticket about this issue?
>
> Thanks!
> Robert
>
> --
> Dr. Robert Redl
> Scientific Programmer, "Waves to Weather" (SFB/TRR165)
> Meteorologisches Institut
> Ludwig-Maximilians-Universität München
> Theresienstr. 37, 80333 München, Germany
>
> Am 03.09.2018 um 08:16 schrieb Yong, Fan:
>
> I would say that it is not your operations order caused trouble.
> Instead, it is related with the snapshot mount logic. As mentioned
> in former reply, we need some patch for the llog logic to avoid
> modifying llog under snapshot mode.
>
>
>
>
>
> --
>
> Cheers,
>
> Nasf
>
> *From:*Kirk, Benjamin (JSC-EG311) [mailto:benjamin.kirk at nasa.gov]
> *Sent:* Tuesday, August 28, 2018 7:53 PM
> *To:* lustre-discuss at lists.lustre.org
> <mailto:lustre-discuss at lists.lustre.org>
> *Cc:* Andreas Dilger <adilger at whamcloud.com>
> <mailto:adilger at whamcloud.com>; Yong, Fan <fan.yong at intel.com>
> <mailto:fan.yong at intel.com>
> *Subject:* Re: [lustre-discuss] Lustre/ZFS snapshots mount error
>
>
>
> The MDS situation is very basic: active/passive mds0/mds1 for both
> fas & fsB. fsA has the combined msg/mdt in a single zfs
> filesystem, and fsB has its own mdt in a separate zfs filesystem.
> mds0 is primary for all.
>
>
>
> fsA & fsB DO both have changelogs enabled to feed robinhood databases.
>
>
>
> What’s the recommended procedure here we should follow before
> mounting the snapshots?
>
>
>
> 1) disable changelogs on the active mdt’s (this will compromise
> robinhood, requiring a rescan…), or
>
> 2) temporarily halt changelog consumption / cleanup (e.g. stop
> robinhood in our case) and then mount the snapshot?
>
>
>
> Thanks for the help!
>
>
>
> --
>
> Benjamin S. Kirk, Ph.D.
>
> NASA Lyndon B. Johnson Space Center
>
> Acting Chief, Aeroscience & Flight Mechanics Division
>
>
>
> On Aug 27, 2018, at 7:33 PM, Yong, Fan <fan.yong at intel.com
> <mailto:fan.yong at intel.com>> wrote:
>
>
>
> According to the stack trace, someone was trying to cleanup
> old empty llogs during mount the snapshot. We do NOT allow any
> modification during mount snapshot; otherwise, it will trigger
> ZFS backend BUG(). That is why we add LASSERT() when start the
> transaction. One possible solution is that, we can add some
> check in the llog logic to avoid modifying llog under snapshot
> mode.
>
>
> --
> Cheers,
> Nasf
>
> -----Original Message-----
> From: lustre-discuss
> [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of
> Andreas Dilger
> Sent: Tuesday, August 28, 2018 5:57 AM
> To: Kirk, Benjamin (JSC-EG311) <benjamin.kirk at nasa.gov
> <mailto:benjamin.kirk at nasa.gov>>
> Cc: lustre-discuss at lists.lustre.org
> <mailto:lustre-discuss at lists.lustre.org>
> Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error
>
> It's probably best to file an LU ticket for this issue.
>
> It looks like there is something with the log processing at
> mount that is trying to modify the configuration files. I'm
> not sure whether that should be allowed or not.
>
> Does fab have the same MGS as fsA? Does it have the same MDS
> node as fsA?
> If it has a different MDS, you might consider to give it its
> own MGS as well.
> That doesn't have to be a separate MGS node, just a separate
> filesystem (ZFS fileset in the same zpool) on the MDS node.
>
> Cheers, Andreas
>
>
>
> On Aug 27, 2018, at 10:18, Kirk, Benjamin (JSC-EG311)
> <benjamin.kirk at nasa.gov <mailto:benjamin.kirk at nasa.gov>>
> wrote:
>
> Hi all,
>
> We have two filesystems, fsA & fsB (eadc below). Both of
> which get snapshots taken daily, rotated over a week.
> It’s a beautiful feature we’ve been using in production
> ever since it was introduced with 2.10.
>
> -) We’ve got Lustre/ZFS 2.10.4 on CentOS 7.5.
> -) Both fsA & fsB have changelogs active.
> -) fsA has combined mgt/mdt on a single ZFS filesystem.
> -) fsB has a single mdt on a single ZFS filesystem.
> -) for fsA, I have no issues mounting any of the snapshots
> via lctl.
> -) for fsB, I can mount the most three recent snapshots,
> then encounter errors:
>
> [root at hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n
> eadc_AutoSS-Mon
> mounted the snapshot eadc_AutoSS-Mon with fsname 3d40bbc
> [root at hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n
> eadc_AutoSS-Mon
> [root at hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n
> eadc_AutoSS-Sun
> mounted the snapshot eadc_AutoSS-Sun with fsname 584c07a
> [root at hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n
> eadc_AutoSS-Sun
> [root at hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n
> eadc_AutoSS-Sat
> mounted the snapshot eadc_AutoSS-Sat with fsname 4e646fe
> [root at hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n
> eadc_AutoSS-Sat
> [root at hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n
> eadc_AutoSS-Fri
> mount.lustre: mount metadata/meta-eadc at eadc_AutoSS-Fri at
> /mnt/eadc_AutoSS-Fri_MDT0000 failed: Read-only file system
> Can't mount
> the snapshot eadc_AutoSS-Fri: Read-only file system
>
> The relevant bits from dmesg are
> [1353434.417762] Lustre: 3d40bbc-MDT0000: set dev_rdonly
> on this
> device [1353434.417765] Lustre: Skipped 3 previous similar
> messages
> [1353434.649480] Lustre: 3d40bbc-MDT0000: Imperative
> Recovery enabled,
> recovery window shrunk from 300-900 down to 150-900
> [1353434.649484]
> Lustre: Skipped 3 previous similar messages
> [1353434.866228] Lustre:
> 3d40bbc-MDD0000: changelog on [1353434.866233] Lustre:
> Skipped 1
> previous similar message [1353435.427744] Lustre:
> 3d40bbc-MDT0000:
> Connection restored to ... at tcp <mailto:... at tcp> (at
> ... at tcp <mailto:... at tcp>) [1353435.427747] Lustre:
> Skipped 23 previous similar messages [1353445.255899]
> Lustre: Failing
> over 3d40bbc-MDT0000 [1353445.255903] Lustre: Skipped 3
> previous
> similar messages [1353445.256150] LustreError: 11-0:
> 3d40bbc-OST0000-osc-MDT0000: operation ost_disconnect to
> node ... at tcp <mailto:... at tcp>
> failed: rc = -107 [1353445.257896] LustreError: Skipped 23
> previous
> similar messages [1353445.353874] Lustre: server umount
> 3d40bbc-MDT0000 complete [1353445.353877] Lustre: Skipped
> 3 previous
> similar messages [1353475.302224] Lustre: 4e646fe-MDD0000:
> changelog
> on [1353475.302228] Lustre: Skipped 1 previous similar
> message [1353498.964016] LustreError:
> 25582:0:(osd_handler.c:341:osd_trans_create())
> 36ca26b-MDT0000-osd: someone try to start transaction
> under readonly mode, should be disabled.
> [1353498.967260] LustreError:
> 25582:0:(osd_handler.c:341:osd_trans_create()) Skipped 1
> previous similar message
> [1353498.968829] CPU: 6 PID: 25582 Comm: mount.lustre
> Kdump: loaded Tainted: P OE ------------
> 3.10.0-862.6.3.el7.x86_64 #1
> [1353498.968830] Hardware name: Supermicro
> SYS-6027TR-D71FRF/X9DRT,
> BIOS 3.2a 08/04/2015 [1353498.968832] Call Trace:
> [1353498.968841] [<ffffffffb5b0e80e>] dump_stack+0x19/0x1b
> [1353498.968851] [<ffffffffc0cbe5db>]
> osd_trans_create+0x38b/0x3d0
> [osd_zfs] [1353498.968876] [<ffffffffc1116044>]
> llog_destroy+0x1f4/0x3f0 [obdclass] [1353498.968887]
> [<ffffffffc111f0f6>] llog_cat_reverse_process_cb+0x246/0x3f0
> [obdclass] [1353498.968897] [<ffffffffc111a32c>]
> llog_reverse_process+0x38c/0xaa0 [obdclass] [1353498.968910]
> [<ffffffffc111eeb0>] ? llog_cat_process_cb+0x4e0/0x4e0
> [obdclass]
> [1353498.968922] [<ffffffffc111af69>]
> llog_cat_reverse_process+0x179/0x270 [obdclass]
> [1353498.968932]
> [<ffffffffc1115585>] ? llog_init_handle+0xd5/0x9a0 [obdclass]
> [1353498.968943] [<ffffffffc1116e78>] ?
> llog_open_create+0x78/0x320
> [obdclass] [1353498.968949] [<ffffffffc12e55f0>] ?
> mdd_root_get+0xf0/0xf0 [mdd] [1353498.968954]
> [<ffffffffc12ec7af>]
> mdd_prepare+0x13ff/0x1c70 [mdd] [1353498.968966]
> [<ffffffffc166b037>]
> mdt_prepare+0x57/0x3b0 [mdt] [1353498.968983]
> [<ffffffffc1183afd>]
> server_start_targets+0x234d/0x2bd0 [obdclass]
> [1353498.968999]
> [<ffffffffc1153500>] ? class_config_dump_handler+0x7e0/0x7e0
> [obdclass] [1353498.969012] [<ffffffffc118541d>]
> server_fill_super+0x109d/0x185a [obdclass] [1353498.969025]
> [<ffffffffc115cef8>] lustre_fill_super+0x328/0x950 [obdclass]
> [1353498.969038] [<ffffffffc115cbd0>] ?
> lustre_common_put_super+0x270/0x270 [obdclass]
> [1353498.969041]
> [<ffffffffb561f3bf>] mount_nodev+0x4f/0xb0 [1353498.969053]
> [<ffffffffc1154f18>] lustre_mount+0x38/0x60 [obdclass]
> [1353498.969055] [<ffffffffb561ff3e>] mount_fs+0x3e/0x1b0
> [1353498.969060] [<ffffffffb563d4b7>]
> vfs_kern_mount+0x67/0x110 [1353498.969062]
> [<ffffffffb563fadf>] do_mount+0x1ef/0xce0
> [1353498.969066] [<ffffffffb55f7c2c>] ?
> kmem_cache_alloc_trace+0x3c/0x200 [1353498.969069]
> [<ffffffffb5640913>] SyS_mount+0x83/0xd0 [1353498.969074]
> [<ffffffffb5b20795>] system_call_fastpath+0x1c/0x21
> [1353498.969079] LustreError:
> 25582:0:(llog_cat.c:1027:llog_cat_reverse_process_cb())
> 36ca26b-MDD0000: fail to destroy empty log: rc = -30
> [1353498.970785] CPU: 6 PID: 25582 Comm: mount.lustre
> Kdump: loaded Tainted: P OE ------------
> 3.10.0-862.6.3.el7.x86_64 #1
> [1353498.970786] Hardware name: Supermicro
> SYS-6027TR-D71FRF/X9DRT,
> BIOS 3.2a 08/04/2015 [1353498.970787] Call Trace:
> [1353498.970790] [<ffffffffb5b0e80e>] dump_stack+0x19/0x1b
> [1353498.970795] [<ffffffffc0cbe5db>]
> osd_trans_create+0x38b/0x3d0
> [osd_zfs] [1353498.970807] [<ffffffffc1117921>]
> llog_cancel_rec+0xc1/0x880 [obdclass] [1353498.970817]
> [<ffffffffc111e13b>] llog_cat_cleanup+0xdb/0x380 [obdclass]
> [1353498.970827] [<ffffffffc111f14d>]
> llog_cat_reverse_process_cb+0x29d/0x3f0 [obdclass]
> [1353498.970838]
> [<ffffffffc111a32c>] llog_reverse_process+0x38c/0xaa0
> [obdclass]
> [1353498.970848] [<ffffffffc111eeb0>] ?
> llog_cat_process_cb+0x4e0/0x4e0 [obdclass] [1353498.970858]
> [<ffffffffc111af69>] llog_cat_reverse_process+0x179/0x270
> [obdclass]
> [1353498.970868] [<ffffffffc1115585>] ?
> llog_init_handle+0xd5/0x9a0
> [obdclass] [1353498.970878] [<ffffffffc1116e78>] ?
> llog_open_create+0x78/0x320 [obdclass] [1353498.970883]
> [<ffffffffc12e55f0>] ? mdd_root_get+0xf0/0xf0 [mdd]
> [1353498.970887]
> [<ffffffffc12ec7af>] mdd_prepare+0x13ff/0x1c70 [mdd]
> [1353498.970894]
> [<ffffffffc166b037>] mdt_prepare+0x57/0x3b0 [mdt]
> [1353498.970908]
> [<ffffffffc1183afd>] server_start_targets+0x234d/0x2bd0
> [obdclass]
> [1353498.970924] [<ffffffffc1153500>] ?
> class_config_dump_handler+0x7e0/0x7e0 [obdclass]
> [1353498.970938]
> [<ffffffffc118541d>] server_fill_super+0x109d/0x185a
> [obdclass]
> [1353498.970950] [<ffffffffc115cef8>]
> lustre_fill_super+0x328/0x950
> [obdclass] [1353498.970962] [<ffffffffc115cbd0>] ?
> lustre_common_put_super+0x270/0x270 [obdclass]
> [1353498.970964]
> [<ffffffffb561f3bf>] mount_nodev+0x4f/0xb0 [1353498.970976]
> [<ffffffffc1154f18>] lustre_mount+0x38/0x60 [obdclass]
> [1353498.970978] [<ffffffffb561ff3e>] mount_fs+0x3e/0x1b0
> [1353498.970980] [<ffffffffb563d4b7>]
> vfs_kern_mount+0x67/0x110
> [1353498.970982] [<ffffffffb563fadf>] do_mount+0x1ef/0xce0
> [1353498.970984] [<ffffffffb55f7c2c>] ?
> kmem_cache_alloc_trace+0x3c/0x200 [1353498.970986]
> [<ffffffffb5640913>] SyS_mount+0x83/0xd0 [1353498.970989]
> [<ffffffffb5b20795>] system_call_fastpath+0x1c/0x21
> [1353498.970996]
> LustreError:
> 25582:0:(mdd_device.c:354:mdd_changelog_llog_init())
> 36ca26b-MDD0000: changelog init failed: rc = -30
> [1353498.972790]
> LustreError: 25582:0:(mdd_device.c:427:mdd_changelog_init())
> 36ca26b-MDD0000: changelog setup during init failed: rc = -30
> [1353498.974525] LustreError:
> 25582:0:(mdd_device.c:1061:mdd_prepare()) 36ca26b-MDD0000:
> failed to
> initialize changelog: rc = -30 [1353498.976229] LustreError:
> 25582:0:(obd_mount_server.c:1879:server_fill_super())
> Unable to start
> targets: -30 [1353499.072002] LustreError:
> 25582:0:(obd_mount.c:1582:lustre_fill_super()) Unable to
> mount (-30)
>
>
> I’m hoping those traces mean something to someone - any ideas?
>
> Thanks!
>
> --
> Benjamin S. Kirk
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> <mailto:lustre-discuss at lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
> Cheers, Andreas
> ---
> Andreas Dilger
> CTO Whamcloud
>
>
>
>
>
>
>
>
>
> _______________________________________________
>
> lustre-discuss mailing list
>
> lustre-discuss at lists.lustre.org
> <mailto:lustre-discuss at lists.lustre.org>
>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180911/9f543b0d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: OpenPGP digital signature
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180911/9f543b0d/attachment-0001.sig>
More information about the lustre-discuss
mailing list