[lustre-discuss] [EXTERNAL] Re: oss servers crashing

Kurt Strosahl strosahl at jlab.org
Wed Jul 15 12:02:31 PDT 2020


This is the trace up to the LBUG

[5533797.889690] Lustre: Skipped 341 previous similar messages
[5533958.749284] LustreError: 105499:0:(tgt_grant.c:571:tgt_grant_incoming()) lustre19-OST002c: cli 901dcd33-cf45-dad4-a0c7-89b9a1fb91b6/ffff99656aa5a800 dirty 0 pend 0 grant -1310720
[5533958.754365] LustreError: 105499:0:(tgt_grant.c:573:tgt_grant_incoming()) LBUG
[5533958.756929] Pid: 105499, comm: ll_ost_io01_071 3.10.0-957.10.1.el7_lustre.x86_64 #1 SMP Tue Apr 30 22:18:15 UTC 2019
[5533958.756931] Call Trace:
[5533958.756948]  [<ffffffffc0bf57cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]

________________________________
From: Alex Zarochentsev <zamenator at gmail.com>
Sent: Wednesday, July 15, 2020 11:20 AM
To: Kurt Strosahl <strosahl at jlab.org>
Cc: lustre-discuss at lists.lustre.org <lustre-discuss at lists.lustre.org>
Subject: [EXTERNAL] Re: [lustre-discuss] oss servers crashing

Hello!

On Wed, Jul 15, 2020 at 5:28 PM Kurt Strosahl <strosahl at jlab.org<mailto:strosahl at jlab.org>> wrote:
Good Morning,

   Yesterday one of our lustre file servers rebooted several times.  the crash dump showed:

can you please provide a failed lustre assert message just above the kernel panic message ?

Thanks,
Zam.


[14333982.153989] Pid: 381367, comm: ll_ost_io01_076 3.10.0-957.10.1.el7_lustre.x86_64 #1 SMP Tue Apr 30 22:18:15 UTC 2019
[14333982.153989] Kernel panic - not syncing: LBUG
[14333982.153990] Call Trace:
[14333982.153993] CPU: 4 PID: 380760 Comm: ll_ost_io01_072 Kdump: loaded Tainted: P           OE  ------------   3.10.0-957.10.1.el7_lustre.x86_64 #1
[14333982.153994] Hardware name: Supermicro Super Server/X11DPL-i, BIOS 3.1 05/21/2019
[14333982.153995] Call Trace:
[14333982.154002]  [<ffffffffbaf62e41>] dump_stack+0x19/0x1b
[14333982.154006]  [<ffffffffbaf5c550>] panic+0xe8/0x21f
[14333982.154018]  [<ffffffffc0ab87cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[14333982.154026]  [<ffffffffc0ab88cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
[14333982.154036]  [<ffffffffc0ab887c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[14333982.154096]  [<ffffffffc12dfae0>] tgt_grant_incoming.isra.6+0x570/0x570 [ptlrpc]
[14333982.154174]  [<ffffffffc12dfae0>] tgt_grant_prepare_read+0x0/0x3b0 [ptlrpc]
[14333982.154232]  [<ffffffffc12dfbeb>] tgt_grant_prepare_read+0x10b/0x3b0 [ptlrpc]
[14333982.154297]  [<ffffffffc12dfbeb>] tgt_grant_prepare_read+0x10b/0x3b0 [ptlrpc]
[14333982.154306]  [<ffffffffc15e1ad0>] ofd_preprw+0x450/0x1160 [ofd]

lustre versions:
lustre-resource-agents-2.12.1-1.el7.x86_64
lustre-2.12.1-1.el7.x86_64
kernel-devel-3.10.0-957.10.1.el7_lustre.x86_64
lustre-osd-zfs-mount-2.12.1-1.el7.x86_64
kernel-headers-3.10.0-957.10.1.el7_lustre.x86_64
kernel-3.10.0-957.10.1.el7_lustre.x86_64
lustre-zfs-dkms-2.12.1-1.el7.noarch

Could this be: https://jira.whamcloud.com/browse/LU-12120<https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.whamcloud.com_browse_LU-2D12120&d=DwMFaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=a1-ymUluZsecMceDMlAHsomwMJl4Iqg-UcfvwQZVldk&m=YJT9Uk1-l_VkuqWZ8LzbYCOgQgIB7NodKUdU04ZH2I8&s=UUumm95pXNO2HROIgZLcbfcrDYD98rOqJY3diW7U1i4&e=>

w/r,

Kurt J. Strosahl
System Administrator: Lustre, HPC
Scientific Computing Group, Thomas Jefferson National Accelerator Facility

_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwMFaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=a1-ymUluZsecMceDMlAHsomwMJl4Iqg-UcfvwQZVldk&m=YJT9Uk1-l_VkuqWZ8LzbYCOgQgIB7NodKUdU04ZH2I8&s=ImWiPMyWLoKXcmVRovEfUKFf5zp_d9wHSg1UfCKnzCU&e=>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200715/949e6fb8/attachment.html>


More information about the lustre-discuss mailing list