[lustre-discuss] Error in lfsck: "NOT IMPLEMETED YET"

João Carlos Mendes Luís jonny at corp.globo.com
Tue Jul 23 12:08:31 PDT 2019


Please help!  Got another panic today, while migrating directories:


*[Tue Jul 23 16:04:18 2019] LustreError: 
52142:0:(service.c:189:ptlrpc_save_lock()) ASSERTION( rs->rs_nlocks < 8 
) failed:**
**[Tue Jul 23 16:04:18 2019] LustreError: 
52142:0:(service.c:189:ptlrpc_save_lock()) LBUG**
**[Tue Jul 23 16:04:18 2019] Pid: 52142, comm: mdt00_002 
3.10.0-957.10.1.el7_lustre.x86_64 #1 SMP Tue Apr 30 22:18:15 UTC 2019**
**
**Message from syslogd at cmal14lb27 at Jul 23 16:04:15 ...**
** kernel:[101545.117309] LustreError: 
52142:0:(service.c:189:ptlrpc_save_lock()) ASSERTION( rs->rs_nlocks < 8 
) failed:**
**
**Message from syslogd at cmal14lb27 at Jul 23 16:04:15 ...**
** kernel:[101545.119130] LustreError: 
52142:0:(service.c:189:ptlrpc_save_lock()) LBUG**
**
**Message from syslogd at cmal14lb27 at Jul 23 16:04:15 ...**
** kernel:LustreError: 52142:0:(service.c:189:ptlrpc_save_lock()) 
ASSERTION( rs->rs_nlocks < 8 ) failed:**
**
**Message from syslogd at cmal14lb27 at Jul 23 16:04:15 ...**
** kernel:LustreError: 52142:0:(service.c:189:ptlrpc_save_lock()) LBUG**
**[Tue Jul 23 16:04:18 2019] Call Trace:**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc0f717cc>] 
libcfs_call_trace+0x8c/0xc0 [libcfs]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc0f7187c>] 
lbug_with_loc+0x4c/0xa0 [libcfs]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc12f9c41>] 
ptlrpc_save_lock+0xc1/0xd0 [ptlrpc]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc1892b0b>] 
mdt_save_lock+0x20b/0x360 [mdt]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc1892cbc>] 
mdt_object_unlock+0x5c/0x3c0 [mdt]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc18aba52>] 
mdt_reint_striped_unlock+0x1a2/0x2f0 [mdt]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc18abbc8>] 
mdt_migrate_object_unlock+0x28/0x60 [mdt]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc18b0544>] 
mdt_reint_migrate+0x934/0x1310 [mdt]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc18b0fa3>] 
mdt_reint_rec+0x83/0x210 [mdt]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc188f1b3>] 
mdt_reint_internal+0x6e3/0xaf0 [mdt]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc189a497>] mdt_reint+0x67/0x140 
[mdt]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc1359e5a>] 
tgt_request_handle+0xaea/0x1580 [ptlrpc]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc12ff80b>] 
ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc130313c>] 
ptlrpc_main+0xafc/0x1fc0 [ptlrpc]**
**[Tue Jul 23 16:04:18 2019] [<ffffffff9fec1c71>] kthread+0xd1/0xe0**
**[Tue Jul 23 16:04:18 2019] [<ffffffffa0575c37>] 
ret_from_fork_nospec_end+0x0/0x39**
**[Tue Jul 23 16:04:18 2019] [<ffffffffffffffff>] 0xffffffffffffffff**
*

     Regards,

         Jonny


------------------------------------------------------------------------
globo.com 	
*João Carlos Mendes Luís*
*Senior DevOps Engineer*
jonny at corp.globo.com <mailto:jonny at corp.globo.com>
+55-21-2483-6893
+55-21-99218-1222


On 22/07/2019 23:35, Joao Carlos Mendes Luis wrote:
> On 7/22/19 11:10 PM, Andreas Dilger wrote:
>> If you are trying to delete MDT0000 then that is definitely not 
>> implemented yet...
>
>
> No, no, no...
>
>
> This was my first idea, but then I understood that the root directory 
> is always on MDT0, so I had to migrate it to another server (after 
> having created two more, and crashed during migration).
>
> I will later try to migrate to another server, and then delete MDT2.  
> But first I need to finish this lfsck...   :-(
>
>
> These "NOT IMPLEMETED (sic)" messages are just from running lfsck_start -A
>
>
>>
>> Cheers, Andreas
>>
>> On Jul 22, 2019, at 16:08, João Carlos Mendes Luís 
>> <jonny at corp.globo.com <mailto:jonny at corp.globo.com>> wrote:
>>
>>> Hi,
>>>
>>>     I'm running some lab tests with lustre 2.12.2 in Oracle Linux 
>>> Server release 7.6.  Last test I did was about migration and MDT 
>>> splitting.  I started with a MGS+MDS node, and two OSS nodes, and 
>>> one of the tests was to create two more MDSs and migrate data 
>>> between then, until, after some time, I could delete the original 
>>> MDS. But something happened in the middle and the servers 
>>> panicked/rebooted.
>>>
>>>     I am now in what appears to be an lfsck bug.  After many other 
>>> tests, I run lfsck_start, and after some time get this message on 
>>> the nodes:
>>>
>>> MGS/MDS0:
>>>
>>> *[Mon Jul 22 17:42:25 2019] LustreError: 
>>> 24107:0:(osd_index.c:1872:osd_index_it_get()) NOT IMPLEMETED YET 
>>> (move to 0x2481000002000000)*
>>>
>>> OSS1/MDS1
>>>
>>> *[Mon Jul 22 17:40:29 2019] LustreError: 
>>> 31558:0:(osd_index.c:1872:osd_index_it_get()) NOT IMPLEMETED YET 
>>> (move to 0xa41300c002000000)*
>>>
>>> OST2/MDS2
>>>
>>> *[Mon Jul 22 17:40:32 2019] LustreError: 
>>> 8935:0:(osd_index.c:1872:osd_index_it_get()) NOT IMPLEMETED YET 
>>> (move to 0xa013000003000000)*
>>>
>>>
>>>     And for current lfsck status, I run *lctl get_param *.*.lfsck* | 
>>> grep -E 'status|\.lfsck_lay|\.lfsck_name'*
>>>
>>> MGS/MDS0:
>>>
>>> *mdd.mirror01-MDT0000.lfsck_layout=**
>>> **status: completed**
>>> **mdd.mirror01-MDT0000.lfsck_namespace=**
>>> **status: partial*
>>>
>>> OSS1/MDS1
>>>
>>> *mdd.mirror01-MDT0001.lfsck_layout=**
>>> **status: completed**
>>> **mdd.mirror01-MDT0001.lfsck_namespace=**
>>> **status: partial**
>>> **obdfilter.mirror01-OST0065.lfsck_layout=**
>>> **status: completed*
>>>
>>> OST2/MDS2
>>>
>>> *mdd.mirror01-MDT0002.lfsck_layout=**
>>> **status: completed**
>>> **mdd.mirror01-MDT0002.lfsck_namespace=**
>>> **status: partial**
>>> **obdfilter.mirror01-OST0066.lfsck_layout=**
>>> **status: completed*
>>>
>>>     Is this a known bug?  How do I fix these "partial" lsfck runs?
>>>
>>>     Thanks for any help,
>>>
>>>
>>>         Jonny
>>>
>>>
>>> ------------------------------------------------------------------------
>>> globo.com 	
>>> *João Carlos Mendes Luís*
>>> *Senior DevOps Engineer*
>>> jonny at corp.globo.com <mailto:jonny at corp.globo.com>
>>> +55-21-2483-6893
>>> +55-21-99218-1222
>>>
>>>
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org <mailto:lustre-discuss at lists.lustre.org>
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
>     Atenciosamente,
>
>         Jonny
>
> -- 
> João Carlos Mendes Luís
> Globo.COM - +55-21-2483-6893
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190723/31bbb5b2/attachment.html>


More information about the lustre-discuss mailing list