[lustre-discuss] Kernel Panic on Snapshot Mount

Jones, Peter A peter.a.jones at intel.com
Thu Apr 26 03:11:54 PDT 2018


Robert

It’s LUG week this week so people may not be keeping up with the mailing lists as closely as usual. I suggest that you open a JIRA ticket about this issue so someone can investigate.

Peter

On 2018-04-25, 11:28 PM, "lustre-discuss on behalf of Robert Redl" <lustre-discuss-bounces at lists.lustre.org<mailto:lustre-discuss-bounces at lists.lustre.org> on behalf of robert.redl at lmu.de<mailto:robert.redl at lmu.de>> wrote:


Good morning!

as there was no reaction, I try to change my question: Is anyone already using 2.11.0 with ZFS and snapshots successfully?

My problem was temporally solved by downgrading the servers to 2.10.3. Snapshots are now working again as expected and can be mounted without any problems.

Best regards,
Robert

On 04/19/2018 04:26 PM, Robert Redl wrote:

Dear All,

today, I updated from Lustre 2.10.3 to 2.11.0 (on centos 7.4). The
update is now finished on all servers and everything seems to work fine.
However, when I try to mount a snapshot (we use the ZFS-backend), this
results immediately in a crash of all servers:

Apr 19 16:02:45 server1 kernel: Lustre: 58ffd1e-MDT0000: set dev_rdonly
on this device
Apr 19 16:02:45 server1 kernel: LustreError:
14660:0:(lu_object.c:1178:lu_device_fini()) ASSERTION(
atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
Apr 19 16:02:45 server1 kernel: LustreError:
14660:0:(lu_object.c:1178:lu_device_fini()) LBUG
Apr 19 16:02:45 server1 kernel: Pid: 14660, comm: mount.lustre
Apr 19 16:02:45 server1 kernel:

Call Trace:
Apr 19 16:02:45 server1 kernel:  [<ffffffffc06557ae>]
libcfs_call_trace+0x4e/0x60 [libcfs]
Apr 19 16:02:45 server1 kernel:  [<ffffffffc065583c>]
lbug_with_loc+0x4c/0xb0 [libcfs]
Apr 19 16:02:45 server1 kernel:  [<ffffffffc0b5502b>]
lu_device_fini+0xbb/0xc0 [obdclass]

Message from syslogd at met-ha-filer05a at Apr 19 16:02:45 ...
 kernel:LustreError: 14660:0:(lu_object.c:1178:lu_device_fini())
ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
Apr 19 16:02:45 server1 kernel:  [<ffffffffc0b59fae>]
dt_device_fini+0xe/0x10 [obdclass]
Apr 19 16:02:45 server1 kernel:  [<ffffffffc0da2ea8>]
osd_device_alloc+0x278/0x3b0 [osd_zfs]
Apr 19 16:02:45 server1 kernel:  [<ffffffffc0b43f7a>]
obd_setup+0x11a/0x2b0 [obdclass]

Message from syslogd at met-ha-filer05a at Apr 19 16:02:45 ...
 kernel:LustreError: 14660:0:(lu_object.c:1178:lu_device_fini()) LBUG
Apr 19 16:02:45 server1 kernel:  [<ffffffffc0b443b8>]
class_setup+0x2a8/0x840 [obdclass]
Apr 19 16:02:45 server1 kernel:  [<ffffffffc0b4882c>]
class_process_config+0x1b5c/0x2810 [obdclass]
Apr 19 16:02:45 server1 kernel:  [<ffffffff81333563>] ?
number.isra.2+0x323/0x360
Apr 19 16:02:45 server1 kernel:  [<ffffffffc0b4c738>]
do_lcfg+0x258/0x500 [obdclass]
Apr 19 16:02:45 server1 kernel:  [<ffffffffc0b50f88>]
lustre_start_simple+0x88/0x210 [obdclass]
Apr 19 16:02:45 server1 kernel:  [<ffffffffc0b7dfba>]
server_fill_super+0xf3a/0x1860 [obdclass]
Apr 19 16:02:45 server1 kernel:  [<ffffffffc0660e27>] ?
libcfs_debug_msg+0x57/0x80 [libcfs]
Apr 19 16:02:45 server1 kernel:  [<ffffffffc0b54228>]
lustre_fill_super+0x328/0x950 [obdclass]
Apr 19 16:02:45 server1 kernel:  [<ffffffffc0b53f00>] ?
lustre_fill_super+0x0/0x950 [obdclass]
Apr 19 16:02:45 server1 kernel:  [<ffffffff8120948f>] mount_nodev+0x4f/0xb0
Apr 19 16:02:45 server1 kernel:  [<ffffffffc0b4c148>]
lustre_mount+0x38/0x60 [obdclass]
Apr 19 16:02:45 server1 kernel:  [<ffffffff81209f1e>] mount_fs+0x3e/0x1b0
Apr 19 16:02:45 server1 kernel:  [<ffffffff81226d57>]
vfs_kern_mount+0x67/0x110
Apr 19 16:02:45 server1 kernel:  [<ffffffff81229263>] do_mount+0x233/0xaf0
Apr 19 16:02:45 server1 kernel:  [<ffffffff8118bb0e>] ?
__get_free_pages+0xe/0x40
Apr 19 16:02:45 server1 kernel:  [<ffffffff81229ea6>] SyS_mount+0x96/0xf0
Apr 19 16:02:45 server1 kernel:  [<ffffffff816c0715>]
system_call_fastpath+0x1c/0x21
Apr 19 16:02:45 server1 kernel:
Apr 19 16:02:45 server1 kernel: Kernel panic - not syncing: LBUG



I'm posting this here as I don't have an account for the actual bug tracker.
Has someone experienced a similar issue?

Best regards
Robert



--

Dr. Robert Redl geb. Schuster
Scientific Programmer, "Waves to Weather" (SFB/TRR165)
Meteorologisches Institut
Ludwig-Maximilians-Universität München
Theresienstr. 37, 80333 München, Germany
Tel.: +49 89 2180 4569
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180426/e371bd0e/attachment.html>


More information about the lustre-discuss mailing list