[lustre-discuss] mlx5 errors on oss

Nehring, Shane R [LAS] snehring at iastate.edu
Thu May 18 11:35:19 PDT 2023


We probably will go that way ultimately. I was somewhat concerned about
compatibility on nodes that have adapters for both fabrics (probably not even a
problem).

On Thu, 2023-05-18 at 16:03 +0000, Andreas Dilger wrote:
> I can't comment on the specific network issue, but in general it is far better
> to use the MOFED drivers than the in-kernel ones. 
> 
> Cheers, Andreas
> 
> > On May 18, 2023, at 09:08, Nehring, Shane R [LAS] via lustre-discuss
> > <lustre-discuss at lists.lustre.org> wrote:
> > 
> > Hello all,
> > 
> > We recently added infiniband to our cluster and are in the process of
> > testing it
> > with lustre. We're running the distro provided drivers for the mellanox
> > cards
> > with the latest firmware. Overnight we started seeing the following errors
> > on a
> > few oss:
> > 
> > infiniband mlx5_0: dump_cqe:272:(pid 40058): dump error cqe
> > 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00000030: 00 00 00 00 00 00 88 13 08 00 00 a0 00 63 4d d2
> > infiniband mlx5_0: dump_cqe:272:(pid 40057): dump error cqe
> > 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00000030: 00 00 00 00 00 00 88 13 08 00 00 a1 00 c2 8e d2
> > infiniband mlx5_0: dump_cqe:272:(pid 40057): dump error cqe
> > 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00000030: 00 00 00 00 00 00 88 13 08 00 00 a2 00 1a 12 d2
> > 
> > I found a post suggesting this might be iommu related, disabling the iommu
> > doesn't seem to help any.
> > 
> > We're running luster 2.15, more or less at the tip of b2_15
> > (b74560d74a9f890838dbf2f0719e3d27c1e5eaf8)
> > 
> > Has anyone seen this before or have any pointers?
> > 
> > Thanks
> > 
> > Shane
> > _______________________________________________
> > lustre-discuss mailing list
> > lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6357 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230518/15d85a34/attachment.bin>


More information about the lustre-discuss mailing list