[lustre-discuss] ASSERTION( obj->oo_with_projid ) failed

Mon Oct 4 09:47:27 PDT 2021

We were able to get our LFS back up using the fix in LU-13189 and have been stable since.  But I'd still appreciate some help backing out of this.  

* Is the "lfs setquota -p 1" the likely cause of our crash?
* If so:
	* Why would it take 1 week to show up?
	* What is the best way to reverse any ill effects the "lfs setquota -p 1" command may have caused?
	* Should there be some protection in the lustre source for this?

-----Original Message-----
From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of "Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] via lustre-discuss" <lustre-discuss at lists.lustre.org>
Reply-To: "Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.]" <darby.vicker-1 at nasa.gov>
Date: Thursday, September 30, 2021 at 11:41 AM
To: Colin Faber via lustre-discuss <lustre-discuss at lists.lustre.org>
Subject: [EXTERNAL] [lustre-discuss] ASSERTION( obj->oo_with_projid ) failed

    Hello everyone,

    We've run into a pretty nasty LBUG that took our LFS down.  We're not exactly sure the cause and could use some help.  Its pretty much identical to this:

	https://jira.whamcloud.com/browse/LU-13189

    One of our OSS's started crashing repeated last night.  We are configured with HA and tried failing over to its pair just to have that OSS crash in the same way.  We are in the process of doing the same thing mentioned in the above LU to get back up and running but we'd like to try and fix this without the #undef ZFS_PROJINHERIT if possible.  A couple of months ago we updated our servers to 2.14 – stock, no modifications – and we'd like to get back to stock 2.14 again if possible.  Up until last night, our experience with 2.14 was great – very stable compared to what we were running previously (very old 2.10) and better performing.  Our specific stack trace from the crash dump is below if that helps.  Our servers are running 3.10.0-1160.31.1.el7.x86_64.  MDT and OST's are both using ZFS (version 2.0).  

    There are two things that could have contributed to the crash.  

    First, about 1 week ago, we tried to use project quotas for the first time.  Without reading the lustre manual, I just tried to set a project quota as such:

    	lfs setquota -p 1 -b 307200 -B 309200 -i 10000 -I 11000 .

    But it was pretty obvious that didn't work.

    	# lfs quota -p 1 /nobackup/
    	Unexpected quotactl error: Operation not supported
    	Disk quotas for prj 1 (pid 1):
    	     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
     	    /nobackup/     [0]     [0]     [0]       -     [0]     [0]     [0]       -
    	Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate.
    	#

    Then, after reading section 25.2 in the lustre manual (https://doc.lustre.org/lustre_manual.xhtml#enabling_disk_quotas), I saw that zfs version >=0.8 with kernel version < 4.5 requires a patched kernel.  So I just moved on figuring project quotas would not work since we are using the stock kernel.  But it now it appears this might be the cause of our problem.  As of right now, I see this in the zfs properties for our metadata filesystem. 

    	[root at hpfs-fsl-mds0 ~]# zpool get all mds0-0-new  | grep proj
    	mds0-0-new  feature at project_quota          active                        local
    	[root at hpfs-fsl-mds0 ~]#

    Several questions come to mind.  

    * Is this the likely cause of our crash?
    * Why would it take 1 week to show up?
    * What is the best way to reverse any ill effects the "lfs setquota -p 1" command may have caused?

    The second possible contributor is related to some maintenance we just finished on the metadata server yesterday morning.  After the update to 2.14 (and zfs update from 0.7 to 2.0), we got this message from "zpool status" on our mdt pool:

      pool: mds0-0
     state: ONLINE
    status: One or more devices are configured to use a non-native block size.
    	Expect reduced performance.
    action: Replace affected devices with devices that support the
    	configured block size, or migrate data to a properly configured
    	pool.
      scan: scrub repaired 0B in 1 days 17:49:23 with 0 errors on Fri Jul  9 21:03:24 2021
    config:

    	NAME        STATE     READ WRITE CKSUM
    	mds0-0      ONLINE       0     0     0
    	  mirror-0  ONLINE       0     0     0
    	    mpathm  ONLINE       0     0     0  block size: 512B configured, 4096B native
    	    mpathn  ONLINE       0     0     0  block size: 512B configured, 4096B native
    	  mirror-1  ONLINE       0     0     0
    	    mpatho  ONLINE       0     0     0  block size: 512B configured, 4096B native
    	    mpathp  ONLINE       0     0     0  block size: 512B configured, 4096B native
    	  mirror-2  ONLINE       0     0     0
    	    mpathq  ONLINE       0     0     0  block size: 512B configured, 4096B native
    	    mpathr  ONLINE       0     0     0  block size: 512B configured, 4096B native
    	  mirror-3  ONLINE       0     0     0
    	    mpaths  ONLINE       0     0     0  block size: 512B configured, 4096B native
    	    mpatht  ONLINE       0     0     0  block size: 512B configured, 4096B native
    	  mirror-4  ONLINE       0     0     0
    	    mpathu  ONLINE       0     0     0  block size: 512B configured, 4096B native
    	    mpathv  ONLINE       0     0     0  block size: 512B configured, 4096B native
    	  mirror-5  ONLINE       0     0     0
    	    mpathw  ONLINE       0     0     0  block size: 512B configured, 4096B native
    	    mpathx  ONLINE       0     0     0  block size: 512B configured, 4096B native

    This is related to the SSD's we are using for the MDT.  The physical block size is 4k (ashift=12) but the logical block size is 0.5k (ashift=9).  Apparently, the old version of zfs (under which the original pool was built) picked ashift=9 but after the update zfs 2.0 was telling us we should be using the larger block size to match the physical block size of these drives.  Despite this mismatch, our mdtest results (via io500) were greatly improved with the lustre 2.14 update.  But its still something we wanted to fix, which was the purpose of our maintenance outage yesterday.  So we backed up the mds0-0/meta-fsl file system to a separate pool, destroyed the old pool, rebuilt it (now with zfs choosing shift=12 for the block size) and copied the data back to the newly created pool.  However, this process failed.  Our old metadata file system (512B block size) was using about 490 GB of our 2.2 TB pool.  Due to the increase in block size, the data take up more space in the file system - potentially 8x more if each entry is less than 512 B to begin with.  We filled up the new ashift=12 pool.  So we had to revert back to an ashift=9 pool.   We are going to have buy more or bigger SSD's (or use raidz instead of raid10) if we want to go to a bigger ashift.  

    So this could be related too.  Theoretically, nothing should have changed as far as lustre was concerned.  But its hard to ignore that we put the file system back in service yesterday morning and about 10 hours later we ran into this problem.  

    If anyone has ideas, please let us know.  We're happy to post details here or to an LU.  

    Thanks,
    Darby Vicker

    [  138.597710] LustreError: 2476:0:(tgt_grant.c:803:tgt_grant_check()) hpfs-fsl-OST0005: cli cd0fda1d-691d-bb4f-1548-c45f8c2e578d is replaying OST_WRITE while one rnb hasn't OBD_BRW_FROM_GRANT set (0x8)
    [  138.699120] LustreError: 2476:0:(osd_object.c:1353:osd_attr_set()) ASSERTION( obj->oo_with_projid ) failed:
    [  138.699155] LustreError: 2476:0:(osd_object.c:1353:osd_attr_set()) LBUG
    [  138.699176] Pid: 2476, comm: tgt_recover_5 3.10.0-1160.31.1.el7.x86_64 #1 SMP Thu Jun 10 13:32:12 UTC 2021
    [  138.699177] Call Trace:
    [  138.699184]  [<ffffffffc104167c>] libcfs_call_trace+0x8c/0xc0 [libcfs]
    [  138.699194]  [<ffffffffc104199c>] lbug_with_loc+0x4c/0xa0 [libcfs]
    [  138.699199]  [<ffffffffc17a62db>] osd_attr_set+0xdeb/0xe60 [osd_zfs]
    [  138.699207]  [<ffffffffc18cf50e>] ofd_write_attr_set+0x87e/0xd20 [ofd]
    [  138.699213]  [<ffffffffc18cfc03>] ofd_commitrw_write+0x253/0x1510 [ofd]
    [  138.699218]  [<ffffffffc18d484d>] ofd_commitrw+0x2ad/0x9a0 [ofd]
    [  138.699223]  [<ffffffffc15b85d1>] tgt_brw_write+0xe51/0x1a10 [ptlrpc]
    [  138.699273]  [<ffffffffc15bca5a>] tgt_request_handle+0x7ea/0x1750 [ptlrpc]
    [  138.699299]  [<ffffffffc150a096>] handle_recovery_req+0x96/0x290 [ptlrpc]
    [  138.699317]  [<ffffffffc151406b>] replay_request_or_update.isra.25+0x2fb/0x930 [ptlrpc]
    [  138.699336]  [<ffffffffc1514dbd>] target_recovery_thread+0x71d/0x11d0 [ptlrpc]
    [  138.699354]  [<ffffffffba6c5e31>] kthread+0xd1/0xe0
    [  138.699357]  [<ffffffffbad95df7>] ret_from_fork_nospec_end+0x0/0x39
    [  138.699360]  [<ffffffffffffffff>] 0xffffffffffffffff
    [  138.699380] Kernel panic - not syncing: LBUG
    [  138.699395] CPU: 1 PID: 2476 Comm: tgt_recover_5 Kdump: loaded Tainted: P           OE  ------------   3.10.0-1160.31.1.el7.x86_64 #1
    [  138.699429] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015
    [  138.699449] Call Trace:
    [  138.699460]  [<ffffffffbad835a9>] dump_stack+0x19/0x1b
    [  138.699477]  [<ffffffffbad7d2b1>] panic+0xe8/0x21f
    [  138.699496]  [<ffffffffc10419eb>] lbug_with_loc+0x9b/0xa0 [libcfs]
    [  138.699519]  [<ffffffffc17a62db>] osd_attr_set+0xdeb/0xe60 [osd_zfs]
    [  138.699543]  [<ffffffffc18ca5cd>] ? ofd_attr_handle_id+0x12d/0x410 [ofd]
    [  138.699566]  [<ffffffffc18cf50e>] ofd_write_attr_set+0x87e/0xd20 [ofd]
    [  138.699588]  [<ffffffffba7de42d>] ? kzfree+0x2d/0x70
    [  138.699607]  [<ffffffffc18cfc03>] ofd_commitrw_write+0x253/0x1510 [ofd]
    [  138.699628]  [<ffffffffba7c7675>] ? __free_pages+0x25/0x30
    [  138.699649]  [<ffffffffc18d484d>] ofd_commitrw+0x2ad/0x9a0 [ofd]
    [  138.699693]  [<ffffffffc15b85d1>] tgt_brw_write+0xe51/0x1a10 [ptlrpc]
    [  138.699738]  [<ffffffffc15bca5a>] tgt_request_handle+0x7ea/0x1750 [ptlrpc]
    [  138.699761]  [<ffffffffba6aee98>] ? add_timer+0x18/0x20
    [  138.699779]  [<ffffffffba6bc13b>] ? __queue_delayed_work+0x8b/0x1a0
    [  138.699822]  [<ffffffffc15bc270>] ? tgt_hpreq_handler+0x2c0/0x2c0 [ptlrpc]
    [  138.699861]  [<ffffffffc150a096>] handle_recovery_req+0x96/0x290 [ptlrpc]
    [  138.699899]  [<ffffffffc151406b>] replay_request_or_update.isra.25+0x2fb/0x930 [ptlrpc]
    [  138.699940]  [<ffffffffc1514dbd>] target_recovery_thread+0x71d/0x11d0 [ptlrpc]
    [  138.699963]  [<ffffffffbad88e60>] ? __schedule+0x320/0x680
    [  138.699998]  [<ffffffffc15146a0>] ? replay_request_or_update.isra.25+0x930/0x930 [ptlrpc]
    [  138.700023]  [<ffffffffba6c5e31>] kthread+0xd1/0xe0
    [  138.700039]  [<ffffffffba6c5d60>] ? insert_kthread_work+0x40/0x40
    [  138.700059]  [<ffffffffbad95df7>] ret_from_fork_nospec_begin+0x21/0x21
    [  138.700079]  [<ffffffffba6c5d60>] ? insert_kthread_work+0x40/0x40

    _______________________________________________
    lustre-discuss mailing list
    lustre-discuss at lists.lustre.org
    https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&data=04%7C01%7Cdarby.vicker-1%40nasa.gov%7C84f0c8414146473a79d308d984398304%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637686204818487883%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=pSEabqH75rxqYLWIKI0JrRjiTMtQ5BhoFuyQKFQsGL8%3D&reserved=0