[lustre-discuss] BAD CHECKSUM

Hans Henrik Happe happe at nbi.dk
Sat Dec 9 09:57:41 PST 2017


On 07-12-2017 21:36, Dilger, Andreas wrote:
> On Dec 7, 2017, at 10:37, Hans Henrik Happe <happe at nbi.dk> wrote:
>> Hi,
>>
>> Can an application cause BAD CHECKSUM errors in Lustre logs by somehow
>> overwriting memory while being DMA'ed to network?
>>
>> After upgrading to 2.10.1 on the server side we started seeing this from
>> a user's application (MPI I/O). Both 2.9.0 and 2.10.1 clients emit these
>> errors. We have not yet established weather the application is doing
>> things correctly.
> If applications are using mmap IO it is possible for the page to become inconsistent after the checksum has been computed.  However, mmap IO is
> normally detected by the client and no message should be printed.
>
> There isn't anything that the application needs to do, since the client will resend the data if there is a checksum error, but the resends do slow down the IO.  If the inconsistency is on the client, there is no cause for concern (though it would be good to figure out the root cause).
>
> It would be interesting to see what the exact error message is, since that will say whether the data became inconsistent on the client, or over the network.  If the inconsistency is over the network or on the server, then that may point to hardware issues.
I've attached logs from a server and a client.

Cheers,
Hans Henrik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20171209/c5469143/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: client.log
Type: text/x-log
Size: 3757 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20171209/c5469143/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: server.log
Type: text/x-log
Size: 1247 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20171209/c5469143/attachment-0001.bin>


More information about the lustre-discuss mailing list