[lustre-discuss] mlx5 errors on oss

Andreas Dilger adilger at whamcloud.com
Thu May 18 09:03:46 PDT 2023


I can't comment on the specific network issue, but in general it is far better to use the MOFED drivers than the in-kernel ones. 

Cheers, Andreas

> On May 18, 2023, at 09:08, Nehring, Shane R [LAS] via lustre-discuss <lustre-discuss at lists.lustre.org> wrote:
> 
> Hello all,
> 
> We recently added infiniband to our cluster and are in the process of testing it
> with lustre. We're running the distro provided drivers for the mellanox cards
> with the latest firmware. Overnight we started seeing the following errors on a
> few oss:
> 
> infiniband mlx5_0: dump_cqe:272:(pid 40058): dump error cqe
> 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000030: 00 00 00 00 00 00 88 13 08 00 00 a0 00 63 4d d2
> infiniband mlx5_0: dump_cqe:272:(pid 40057): dump error cqe
> 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000030: 00 00 00 00 00 00 88 13 08 00 00 a1 00 c2 8e d2
> infiniband mlx5_0: dump_cqe:272:(pid 40057): dump error cqe
> 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000030: 00 00 00 00 00 00 88 13 08 00 00 a2 00 1a 12 d2
> 
> I found a post suggesting this might be iommu related, disabling the iommu
> doesn't seem to help any.
> 
> We're running luster 2.15, more or less at the tip of b2_15
> (b74560d74a9f890838dbf2f0719e3d27c1e5eaf8)
> 
> Has anyone seen this before or have any pointers?
> 
> Thanks
> 
> Shane
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


More information about the lustre-discuss mailing list