[lustre-discuss] Error in lfsck: "NOT IMPLEMETED YET"
João Carlos Mendes Luís
jonny at corp.globo.com
Tue Jul 23 12:08:31 PDT 2019
Please help! Got another panic today, while migrating directories:
*[Tue Jul 23 16:04:18 2019] LustreError:
52142:0:(service.c:189:ptlrpc_save_lock()) ASSERTION( rs->rs_nlocks < 8
) failed:**
**[Tue Jul 23 16:04:18 2019] LustreError:
52142:0:(service.c:189:ptlrpc_save_lock()) LBUG**
**[Tue Jul 23 16:04:18 2019] Pid: 52142, comm: mdt00_002
3.10.0-957.10.1.el7_lustre.x86_64 #1 SMP Tue Apr 30 22:18:15 UTC 2019**
**
**Message from syslogd at cmal14lb27 at Jul 23 16:04:15 ...**
** kernel:[101545.117309] LustreError:
52142:0:(service.c:189:ptlrpc_save_lock()) ASSERTION( rs->rs_nlocks < 8
) failed:**
**
**Message from syslogd at cmal14lb27 at Jul 23 16:04:15 ...**
** kernel:[101545.119130] LustreError:
52142:0:(service.c:189:ptlrpc_save_lock()) LBUG**
**
**Message from syslogd at cmal14lb27 at Jul 23 16:04:15 ...**
** kernel:LustreError: 52142:0:(service.c:189:ptlrpc_save_lock())
ASSERTION( rs->rs_nlocks < 8 ) failed:**
**
**Message from syslogd at cmal14lb27 at Jul 23 16:04:15 ...**
** kernel:LustreError: 52142:0:(service.c:189:ptlrpc_save_lock()) LBUG**
**[Tue Jul 23 16:04:18 2019] Call Trace:**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc0f717cc>]
libcfs_call_trace+0x8c/0xc0 [libcfs]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc0f7187c>]
lbug_with_loc+0x4c/0xa0 [libcfs]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc12f9c41>]
ptlrpc_save_lock+0xc1/0xd0 [ptlrpc]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc1892b0b>]
mdt_save_lock+0x20b/0x360 [mdt]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc1892cbc>]
mdt_object_unlock+0x5c/0x3c0 [mdt]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc18aba52>]
mdt_reint_striped_unlock+0x1a2/0x2f0 [mdt]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc18abbc8>]
mdt_migrate_object_unlock+0x28/0x60 [mdt]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc18b0544>]
mdt_reint_migrate+0x934/0x1310 [mdt]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc18b0fa3>]
mdt_reint_rec+0x83/0x210 [mdt]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc188f1b3>]
mdt_reint_internal+0x6e3/0xaf0 [mdt]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc189a497>] mdt_reint+0x67/0x140
[mdt]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc1359e5a>]
tgt_request_handle+0xaea/0x1580 [ptlrpc]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc12ff80b>]
ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]**
**[Tue Jul 23 16:04:18 2019] [<ffffffffc130313c>]
ptlrpc_main+0xafc/0x1fc0 [ptlrpc]**
**[Tue Jul 23 16:04:18 2019] [<ffffffff9fec1c71>] kthread+0xd1/0xe0**
**[Tue Jul 23 16:04:18 2019] [<ffffffffa0575c37>]
ret_from_fork_nospec_end+0x0/0x39**
**[Tue Jul 23 16:04:18 2019] [<ffffffffffffffff>] 0xffffffffffffffff**
*
Regards,
Jonny
------------------------------------------------------------------------
globo.com
*João Carlos Mendes Luís*
*Senior DevOps Engineer*
jonny at corp.globo.com <mailto:jonny at corp.globo.com>
+55-21-2483-6893
+55-21-99218-1222
On 22/07/2019 23:35, Joao Carlos Mendes Luis wrote:
> On 7/22/19 11:10 PM, Andreas Dilger wrote:
>> If you are trying to delete MDT0000 then that is definitely not
>> implemented yet...
>
>
> No, no, no...
>
>
> This was my first idea, but then I understood that the root directory
> is always on MDT0, so I had to migrate it to another server (after
> having created two more, and crashed during migration).
>
> I will later try to migrate to another server, and then delete MDT2.
> But first I need to finish this lfsck... :-(
>
>
> These "NOT IMPLEMETED (sic)" messages are just from running lfsck_start -A
>
>
>>
>> Cheers, Andreas
>>
>> On Jul 22, 2019, at 16:08, João Carlos Mendes Luís
>> <jonny at corp.globo.com <mailto:jonny at corp.globo.com>> wrote:
>>
>>> Hi,
>>>
>>> I'm running some lab tests with lustre 2.12.2 in Oracle Linux
>>> Server release 7.6. Last test I did was about migration and MDT
>>> splitting. I started with a MGS+MDS node, and two OSS nodes, and
>>> one of the tests was to create two more MDSs and migrate data
>>> between then, until, after some time, I could delete the original
>>> MDS. But something happened in the middle and the servers
>>> panicked/rebooted.
>>>
>>> I am now in what appears to be an lfsck bug. After many other
>>> tests, I run lfsck_start, and after some time get this message on
>>> the nodes:
>>>
>>> MGS/MDS0:
>>>
>>> *[Mon Jul 22 17:42:25 2019] LustreError:
>>> 24107:0:(osd_index.c:1872:osd_index_it_get()) NOT IMPLEMETED YET
>>> (move to 0x2481000002000000)*
>>>
>>> OSS1/MDS1
>>>
>>> *[Mon Jul 22 17:40:29 2019] LustreError:
>>> 31558:0:(osd_index.c:1872:osd_index_it_get()) NOT IMPLEMETED YET
>>> (move to 0xa41300c002000000)*
>>>
>>> OST2/MDS2
>>>
>>> *[Mon Jul 22 17:40:32 2019] LustreError:
>>> 8935:0:(osd_index.c:1872:osd_index_it_get()) NOT IMPLEMETED YET
>>> (move to 0xa013000003000000)*
>>>
>>>
>>> And for current lfsck status, I run *lctl get_param *.*.lfsck* |
>>> grep -E 'status|\.lfsck_lay|\.lfsck_name'*
>>>
>>> MGS/MDS0:
>>>
>>> *mdd.mirror01-MDT0000.lfsck_layout=**
>>> **status: completed**
>>> **mdd.mirror01-MDT0000.lfsck_namespace=**
>>> **status: partial*
>>>
>>> OSS1/MDS1
>>>
>>> *mdd.mirror01-MDT0001.lfsck_layout=**
>>> **status: completed**
>>> **mdd.mirror01-MDT0001.lfsck_namespace=**
>>> **status: partial**
>>> **obdfilter.mirror01-OST0065.lfsck_layout=**
>>> **status: completed*
>>>
>>> OST2/MDS2
>>>
>>> *mdd.mirror01-MDT0002.lfsck_layout=**
>>> **status: completed**
>>> **mdd.mirror01-MDT0002.lfsck_namespace=**
>>> **status: partial**
>>> **obdfilter.mirror01-OST0066.lfsck_layout=**
>>> **status: completed*
>>>
>>> Is this a known bug? How do I fix these "partial" lsfck runs?
>>>
>>> Thanks for any help,
>>>
>>>
>>> Jonny
>>>
>>>
>>> ------------------------------------------------------------------------
>>> globo.com
>>> *João Carlos Mendes Luís*
>>> *Senior DevOps Engineer*
>>> jonny at corp.globo.com <mailto:jonny at corp.globo.com>
>>> +55-21-2483-6893
>>> +55-21-99218-1222
>>>
>>>
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org <mailto:lustre-discuss at lists.lustre.org>
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
> Atenciosamente,
>
> Jonny
>
> --
> João Carlos Mendes Luís
> Globo.COM - +55-21-2483-6893
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190723/31bbb5b2/attachment.html>
More information about the lustre-discuss
mailing list