[lustre-discuss] infiniband mlx5_0: dump_cqe:286:(pid 25761): dump error cqe

肖正刚 guru.novice at gmail.com
Thu Jul 30 09:16:19 PDT 2020


Hi,
Thanks for your suggestion.
But , to reboot the OSSs in production under massive IO pressure  will make
another long long story .

Regards.


Weiss, Karsten <karsten.weiss at atos.net> 于2020年7月30日周四 下午11:31写道:

> Hi!
>
>
>
> (Caveat: I ran into this issue not on Lustre but on HPC MPI jobs on CentOS
> 7.7. They only run stable
>
> with the workaround.)
>
>
>
> I’ve opened a bug with Red Hat at
> https://bugzilla.redhat.com/show_bug.cgi?id=1796825 but unfortunately,
>
> it is no longer public (or fixed/closed) i.e. you probably won’t be able
> to read it.
>
>
>
> To make a long story short: You may try to boot with the kernel parameter
> “iommu=pt” as a workaround(!).
>
>
>
> Please let me know if this “fixes” the problem for you. YMMV.
>
>
>
> Best regards,
>
> Karsten
>
>
>
> --
>
> *Dipl.-Inf. Karsten Weiss *s+c / Atos
>
> T +49 7071 9457 452
>
> karsten.weiss at atos.net
>
> https://atos.net/de/deutschland/sc-en
>
>
>
> *From:* lustre-discuss <lustre-discuss-bounces at lists.lustre.org> *On
> Behalf Of *???
> *Sent:* Thursday, July 30, 2020 16:05
> *To:* lustre-discuss <lustre-discuss at lists.lustre.org>
> *Subject:* [lustre-discuss] infiniband mlx5_0: dump_cqe:286:(pid 25761):
> dump error cqe
>
>
>
> Hi, all
>
>
>
> we installed lustre-2.12.2 both server and clients ,recently,our oss's
> syslog&dmesg flooding with messages like below:
>
>>
> infiniband mlx5_0: dump_cqe:286:(pid 25761): dump error cqe
> 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000030: 00 00 00 00 00 00 88 13 08 00 84 79 01 04 4c d0
> LustreError: 25762:0:(events.c:450:server_bulk_callback()) event type 5,
> status -5, desc ffff9ffdf58c0a00
> LustreError: 25755:0:(events.c:450:server_bulk_callback()) event type 5,
> status -103, desc ffff9ffdf58c0a00
> LustreError: 25755:0:(events.c:450:server_bulk_callback()) event type 5,
> status -103, desc ffff9ffdf58c0a00
> LustreError: 25755:0:(events.c:450:server_bulk_callback()) event type 5,
> status -103, desc ffff9ffdf58c0a00
>
>>
> Does anyone hit this beforce or any suggestions?
>
>
>
> Thanks?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200731/38eb700a/attachment.html>


More information about the lustre-discuss mailing list