[lustre-discuss] DMA Interrupt Remap Errors

Tue Sep 29 13:24:05 PDT 2015

Hello,

I've been getting these types of error messages on my OSS nodes, and I'm
wondering if I have a client sending bad data through RDMA. My googling has
been fruitless to discover the meaning of "fault reason 34" but the PCI
addresses are my 10Gb and 1Gb NICs.

I'm not really sure where to begin diagnosing this error, so I'm hoping one
of you have seen this before. One thing to note is that no clients should
be using the 1Gb NIC to mount the file system; it's just for management so
I don't know why I'd see a DMA error on PCI 04:00.1.

dmar: INTR-REMAP: Request device [[81:00.1] fault index 8c
INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear
dmar: DRHD: handling fault status reg 102
dmar: INTR-REMAP: Request device [[04:00.1] fault index 75
INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear

dmar: DRHD: handling fault status reg 202
dmar: INTR-REMAP: Request device [[04:00.1] fault index 74
INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear
dmar: DRHD: handling fault status reg 302
dmar: INTR-REMAP: Request device [[04:00.1] fault index 73
INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear

kernel: 2.6.32-504.23.4.el6.x86_64
lustre: lustre-2.7.58-2.6.32_504.23.4.el6.x86_64_g051c25b.x86_64
zfs: zfs-0.6.4-76_g87abfcb.el6.x86_64
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20150929/78bb6897/attachment.htm>