[Lustre-discuss] ost_brw_write()

Kevin Van Maren Kevin.Vanmaren at Sun.COM
Wed Dec 31 14:09:32 PST 2008


Use the manpage, as it varies somewhat across releases, but basically 
disable the TX and RX checksum offload, and the large TCP segmentation 
offload:

ethtool -K eth0 rx off
ethtool -K eth0 tx off
ethtool -K eth0 tso off

"ethtool -k eth0" should report them all off.  Repeat for all 
interfaces, since you are doing bonding.

If you do that on all the clients and servers, and if the problem goes 
away, turn them back on one at a time to see which is causing your problems.

Kevin


Mag Gam wrote:
> Kevin:
>
> Thanks for the response.
>
> What do I need to change using ethtool? BTW, I am using ethernet
> bonding to increase bandwidth. I suspect this could be causing the
> problem...
>
> I am not sure if my applications are using mmap(). I am not aware of
> an easy way to determine if they are.
>
>
>
> On Wed, Dec 31, 2008 at 12:34 PM, Kevin Van Maren
> <Kevin.Vanmaren at sun.com> wrote:
>   
>> I have previously observed cases where the RX checksum offload NIC would
>> pass packets up
>> to Linux as "good" if the Ethernet CRC was valid, even though the UDP
>> checksum failed (for
>> some reason it appeared that something (the sender?) was corrupting a byte
>> in the payload after
>> calculating the UDP csum, but before the Ethernet CRC was calculated).
>>
>> So disable any NIC offloading on both sides (ethtool) and see if the Lustre
>> csums errors go away.
>>
>> Also note that is you are using mmap files, it is _expected_ that the csum
>> might not match,
>> as the page can be modified between when the csum is calculated by Luster,
>> and the page
>> is actually transmitted.
>>
>> Kevin
>>
>>
>> Mag Gam wrote:
>>     
>>> I have done the tuning but still occasionally get a CSUM error. About
>>> 200 per day.  Considering, we probally transfer close to 500G to 1TB
>>> of data a day is not that bad.
>>>
>>> I did the tuning on the e1000 card but I am not sure what else to do.
>>> The network guys have nothing wrong with their switch and the cables
>>> are fine (we even got them replaced).
>>>
>>> Since lustre has its own checksumming, I suppose I am in good shape...
>>>
>>>
>>>       
>>>>>>             
>>>>> No.  Nobody said anything about packets being dropped.  They are failing
>>>>> checksum.
>>>>>           




More information about the lustre-discuss mailing list