[lustre-discuss] Lustre 2.12.6 client crashes

Christopher Mountford cjm14 at leicester.ac.uk
Thu Jan 20 05:03:50 PST 2022


Date: Thu, 20 Jan 2022 12:07:40 +0000
From: Christopher Mountford <cjm14 at le.ac.uk>
To: lustre-discuss at lists.lustre.org
Subject: Client crashes
User-Agent: NeoMutt/20170306 (1.8.0)

Hi All,

We've started getting some fairly regular client panics on out lustre 2.12.7 filesystem, looking at the stack trace I think we are hitting this bug: https://jira.whamcloud.com/browse/LU-12752

I note that a fix is in 2.15.0, is this likely to be patched in a 2.12 release?

We're still trying to isolate the job that is causing the crash, but once we have we should be able to reproduce this reliably.

Kind Regards,
Christopher.

Log entriy:

Jan 20 10:23:39 lmem006 kernel: LustreError: 4661:0:(osc_cache.c:2519:osc_teardown_async_page()) extent ffff937e2756e4d0@{[0 -> 255/255], [2|0|-|cache|wi|ffff92fdd1dd8b40], [1703936|1|+|-|ffff932384f1e880|256|          (null)]} trunc at 42.
+Jan 20 10:23:39 lmem006 kernel: LustreError: 4661:0:(osc_cache.c:2519:osc_teardown_async_page()) ### extent: ffff937e2756e4d0 ns: alice3-OST001f-osc-ffff938e6a743000 lock: ffff932384f1e880/0x6024b6d908313ce7 lrc: 2/0,0 mode: PW/PW res:
+[0x7c0000400:0x5c888a:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 65536->172031) flags: 0x800020000000000 nid: local remote: 0x345e4fe1c451a182 expref: -99 pid: 955 timeout: 0 lvb_type: 1
+Jan 20 10:23:39 lmem006 kernel: LustreError: 4661:0:(osc_page.c:192:osc_page_delete()) page at ffff933651225e00[2 ffff93228480b2f0 4 1           (null)]
Jan 20 10:23:39 lmem006 kernel: LustreError: 4661:0:(osc_page.c:192:osc_page_delete()) vvp-page at ffff933651225e50(0:0) vm at ffffeaeada357d80 6fffff00000879 3:0 ffff933651225e00 42 lru
Jan 20 10:23:39 lmem006 kernel: LustreError: 4661:0:(osc_page.c:192:osc_page_delete()) lov-page at ffff933651225e90, comp index: 10000, gen: 6
Jan 20 10:23:39 lmem006 kernel: LustreError: 4661:0:(osc_page.c:192:osc_page_delete()) osc-page at ffff933651225ec8 42: 1< 0x845fed 2 0 + - > 2< 172032 0 4096 0x0 0x420 |           (null) ffff938e52a7d738 ffff92fdd1dd8b40 > 3< 0 0 0 > 4< 0 0 8 1703936 - | - - + - >
+5< - - + - | 0 - | 1 - ->
+Jan 20 10:23:39 lmem006 kernel: LustreError: 4661:0:(osc_page.c:192:osc_page_delete()) end page at ffff933651225e00
Jan 20 10:23:39 lmem006 kernel: LustreError: 4661:0:(osc_page.c:192:osc_page_delete()) Trying to teardown failed: -16
Jan 20 10:23:39 lmem006 kernel: LustreError: 4661:0:(osc_page.c:193:osc_page_delete()) ASSERTION( 0 ) failed:
Jan 20 10:23:40 lmem006 kernel: LustreError: 4661:0:(osc_page.c:193:osc_page_delete()) LBUG
Jan 20 10:23:40 lmem006 kernel: Pid: 4661, comm: diamond 3.10.0-1160.49.1.el7.x86_64 #1 SMP Tue Nov 30 15:51:32 UTC 2021
Jan 20 10:23:40 lmem006 kernel: Call Trace:
Jan 20 10:23:40 lmem006 kernel: [<ffffffffc0f087cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
Jan 20 10:23:40 lmem006 kernel: [<ffffffffc0f0887c>] lbug_with_loc+0x4c/0xa0 [libcfs]
Jan 20 10:23:40 lmem006 kernel: [<ffffffffc145fe7f>] osc_page_delete+0x48f/0x500 [osc]
Jan 20 10:23:40 lmem006 kernel: [<ffffffffc107b2d0>] cl_page_delete0+0x80/0x220 [obdclass]
Jan 20 10:23:40 lmem006 kernel: [<ffffffffc107b4a3>] cl_page_delete+0x33/0x110 [obdclass]
Jan 20 10:23:40 lmem006 kernel: [<ffffffffc156f27f>] ll_invalidatepage+0x7f/0x170 [lustre]
Jan 20 10:23:40 lmem006 kernel: [<ffffffff93bcefed>] do_invalidatepage_range+0x7d/0x90
Jan 20 10:23:40 lmem006 kernel: [<ffffffff93bcf097>] truncate_inode_page+0x77/0x80
Jan 20 10:23:40 lmem006 kernel: [<ffffffff93bcf2ca>] truncate_inode_pages_range+0x1ea/0x750
Jan 20 10:23:40 lmem006 kernel: [<ffffffff93bcf89f>] truncate_inode_pages_final+0x4f/0x60
Jan 20 10:23:40 lmem006 kernel: [<ffffffffc1554c1f>] ll_delete_inode+0x4f/0x230 [lustre]
Jan 20 10:23:40 lmem006 kernel: [<ffffffff93c6c934>] evict+0xb4/0x180
Jan 20 10:23:40 lmem006 kernel: [<ffffffff93c6cd6c>] iput+0xfc/0x190
Jan 20 10:23:40 lmem006 kernel: [<ffffffff93c676f8>] __dentry_kill+0x158/0x1d0
Jan 20 10:23:40 lmem006 kernel: [<ffffffff93c67d95>] dput+0xb5/0x1a0
Jan 20 10:23:40 lmem006 kernel: [<ffffffff93c5092d>] __fput+0x18d/0x230
Jan 20 10:23:40 lmem006 kernel: [<ffffffff93c50abe>] ____fput+0xe/0x10
Jan 20 10:23:40 lmem006 kernel: [<ffffffff93ac299b>] task_work_run+0xbb/0xe0
Jan 20 10:23:40 lmem006 kernel: [<ffffffff93a2cc65>] do_notify_resume+0xa5/0xc0
Jan 20 10:23:40 lmem006 kernel: [<ffffffff941962ef>] int_signal+0x12/0x17
Jan 20 10:23:40 lmem006 kernel: [<ffffffffffffffff>] 0xffffffffffffffff
Jan 20 10:23:40 lmem006 kernel: Kernel panic - not syncing: LBUG




More information about the lustre-discuss mailing list