[lustre-discuss] oss servers crashing

Alex Zarochentsev zamenator at gmail.com
Wed Jul 15 08:20:54 PDT 2020


Hello!

On Wed, Jul 15, 2020 at 5:28 PM Kurt Strosahl <strosahl at jlab.org> wrote:

> Good Morning,
>
>    Yesterday one of our lustre file servers rebooted several times.  the
> crash dump showed:
>

can you please provide a failed lustre assert message just above the kernel
panic message ?

Thanks,
Zam.


> [14333982.153989] Pid: 381367, comm: ll_ost_io01_076
> 3.10.0-957.10.1.el7_lustre.x86_64 #1 SMP Tue Apr 30 22:18:15 UTC 2019
> [14333982.153989] Kernel panic - not syncing: LBUG
> [14333982.153990] Call Trace:
> [14333982.153993] CPU: 4 PID: 380760 Comm: ll_ost_io01_072 Kdump: loaded
> Tainted: P           OE  ------------   3.10.0-957.10.1.el7_lustre.x86_64 #1
> [14333982.153994] Hardware name: Supermicro Super Server/X11DPL-i, BIOS
> 3.1 05/21/2019
> [14333982.153995] Call Trace:
> [14333982.154002]  [<ffffffffbaf62e41>] dump_stack+0x19/0x1b
> [14333982.154006]  [<ffffffffbaf5c550>] panic+0xe8/0x21f
> [14333982.154018]  [<ffffffffc0ab87cc>] libcfs_call_trace+0x8c/0xc0
> [libcfs]
> [14333982.154026]  [<ffffffffc0ab88cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
> [14333982.154036]  [<ffffffffc0ab887c>] lbug_with_loc+0x4c/0xa0 [libcfs]
> [14333982.154096]  [<ffffffffc12dfae0>]
> tgt_grant_incoming.isra.6+0x570/0x570 [ptlrpc]
> [14333982.154174]  [<ffffffffc12dfae0>] tgt_grant_prepare_read+0x0/0x3b0
> [ptlrpc]
> [14333982.154232]  [<ffffffffc12dfbeb>] tgt_grant_prepare_read+0x10b/0x3b0
> [ptlrpc]
> [14333982.154297]  [<ffffffffc12dfbeb>] tgt_grant_prepare_read+0x10b/0x3b0
> [ptlrpc]
> [14333982.154306]  [<ffffffffc15e1ad0>] ofd_preprw+0x450/0x1160 [ofd]
>
> lustre versions:
> lustre-resource-agents-2.12.1-1.el7.x86_64
> lustre-2.12.1-1.el7.x86_64
> kernel-devel-3.10.0-957.10.1.el7_lustre.x86_64
> lustre-osd-zfs-mount-2.12.1-1.el7.x86_64
> kernel-headers-3.10.0-957.10.1.el7_lustre.x86_64
> kernel-3.10.0-957.10.1.el7_lustre.x86_64
> lustre-zfs-dkms-2.12.1-1.el7.noarch
>
> Could this be: https://jira.whamcloud.com/browse/LU-12120
>
> w/r,
>
> Kurt J. Strosahl
> System Administrator: Lustre, HPC
> Scientific Computing Group, Thomas Jefferson National Accelerator Facility
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200715/56f4c1eb/attachment-0001.html>


More information about the lustre-discuss mailing list