[lustre-discuss] DMA Interrupt Remap Errors

Exec Unerd execunerd at gmail.com
Wed Sep 30 12:17:35 PDT 2015


Thanks,

This is new territory for me so bear with.

04:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network
Connection (rev 01)
81:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+
Network Connection (rev 01)

It appears we have MSI enabled, but I'm not sure how to verify that or RSS
expressly. This is from the OSS, where eth3 is a private link to the
partner OSS, and eth1 is the management subnet, and ixgbe1 is the cluster
client path.

# cat /proc/interrupts | grep -i msi | awk '{print $1,$18,$19}' | grep TxRx
157: IR-PCI-MSI-edge eth3-TxRx-0
158: IR-PCI-MSI-edge eth3-TxRx-1
159: IR-PCI-MSI-edge eth3-TxRx-2
160: IR-PCI-MSI-edge eth3-TxRx-3
161: IR-PCI-MSI-edge eth3-TxRx-4
162: IR-PCI-MSI-edge eth3-TxRx-5
163: IR-PCI-MSI-edge eth3-TxRx-6
164: IR-PCI-MSI-edge eth3-TxRx-7
175: IR-PCI-MSI-edge eth1-TxRx-0
176: IR-PCI-MSI-edge eth1-TxRx-1
177: IR-PCI-MSI-edge eth1-TxRx-2
178: IR-PCI-MSI-edge eth1-TxRx-3
179: IR-PCI-MSI-edge eth1-TxRx-4
180: IR-PCI-MSI-edge eth1-TxRx-5
181: IR-PCI-MSI-edge eth1-TxRx-6
182: IR-PCI-MSI-edge eth1-TxRx-7
200: IR-PCI-MSI-edge ixgbe1-TxRx-0
201: IR-PCI-MSI-edge ixgbe1-TxRx-1
202: IR-PCI-MSI-edge ixgbe1-TxRx-2
203: IR-PCI-MSI-edge ixgbe1-TxRx-3
204: IR-PCI-MSI-edge ixgbe1-TxRx-4
205: IR-PCI-MSI-edge ixgbe1-TxRx-5
206: IR-PCI-MSI-edge ixgbe1-TxRx-6
207: IR-PCI-MSI-edge ixgbe1-TxRx-7
208: IR-PCI-MSI-edge ixgbe1-TxRx-8
209: IR-PCI-MSI-edge ixgbe1-TxRx-9
210: IR-PCI-MSI-edge ixgbe1-TxRx-10
211: IR-PCI-MSI-edge ixgbe1-TxRx-11
212: IR-PCI-MSI-edge ixgbe1-TxRx-12
213: IR-PCI-MSI-edge ixgbe1-TxRx-13
214: IR-PCI-MSI-edge ixgbe1-TxRx-14
215: IR-PCI-MSI-edge ixgbe1-TxRx-15


On Tue, Sep 29, 2015 at 2:02 PM, Ashish Purkar <ashish.purkar at seagate.com>
wrote:

> Are you using RSS with MSIX on NIC?
> Please provide more details about the NIC and the configuration.
>
> app√
> On Sep 30, 2015 1:54 AM, "Exec Unerd" <execunerd at gmail.com> wrote:
>
>> Hello,
>>
>> I've been getting these types of error messages on my OSS nodes, and I'm
>> wondering if I have a client sending bad data through RDMA. My googling has
>> been fruitless to discover the meaning of "fault reason 34" but the PCI
>> addresses are my 10Gb and 1Gb NICs.
>>
>> I'm not really sure where to begin diagnosing this error, so I'm hoping
>> one of you have seen this before. One thing to note is that no clients
>> should be using the 1Gb NIC to mount the file system; it's just for
>> management so I don't know why I'd see a DMA error on PCI 04:00.1.
>>
>> dmar: INTR-REMAP: Request device [[81:00.1] fault index 8c
>> INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear
>> dmar: DRHD: handling fault status reg 102
>> dmar: INTR-REMAP: Request device [[04:00.1] fault index 75
>> INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear
>>
>> dmar: DRHD: handling fault status reg 202
>> dmar: INTR-REMAP: Request device [[04:00.1] fault index 74
>> INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear
>> dmar: DRHD: handling fault status reg 302
>> dmar: INTR-REMAP: Request device [[04:00.1] fault index 73
>> INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear
>>
>> kernel: 2.6.32-504.23.4.el6.x86_64
>> lustre: lustre-2.7.58-2.6.32_504.23.4.el6.x86_64_g051c25b.x86_64
>> zfs: zfs-0.6.4-76_g87abfcb.el6.x86_64
>>
>>
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=BQICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=FtYV9f_ig6ynAGsdwsQr2_tmRri3ty7J2xOP7XSVZbg&m=jdYnF_2RoPuBH6V_D188yhO85J9xqOAWB95EUSaL4Hc&s=WcAJJh6R5ABmdqrVboo_uo3opBF0g1jJis_ITccu_z4&e=
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20150930/77c2016c/attachment.htm>


More information about the lustre-discuss mailing list