<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
The difference between your Intel and AMD nodes may be the RPC checksum type that is used by default (the clients and servers negotiate the fastest algorithm).
<div class=""><br class="">
<div class="">I suspect the checksum error is itself fixed already, but in the meantime you could try setting a different checksum than t10ip4k (or whatever it is you are using, compare "lctl get_param osc.*.checksum_type" on your Intel vs. AMD clients).</div>
<div class=""><br class="">
<div>Cheers, Andreas</div>
<div><br class="">
<blockquote type="cite" class="">
<div class="">On Jun 3, 2024, at 08:21, Fokke Dijkstra via lustre-discuss <<a href="mailto:lustre-discuss@lists.lustre.org" class="">lustre-discuss@lists.lustre.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div dir="ltr" class="">Dear all,<br class="">
<br class="">
We are frequently (about daily) seeing the following type of error in our logfile on some specific client nodes:<br class="">
<br class="">
Jun 1 11:03:17 a100gpu1 kernel: LustreError: 3834:0:(integrity.c:66:obd_page_dif_generate_buffer()) scratch-OST0042-osc-ff35febc655a9000: unexpected used guard number of DIF 5/5, data length 4096, sector s<br class="">
ize 512: rc = -7<br class="">
Jun 1 11:03:17 a100gpu1 kernel: LustreError: 3834:0:(osc_request.c:2750:osc_build_rpc()) prep_req failed: -7<br class="">
Jun 1 11:03:17 a100gpu1 kernel: LustreError: 3834:0:(osc_cache.c:2186:osc_check_rpcs()) Write request failed with -7<br class="">
<br class="">
We are running Lustre 2.15.4 over Ethernet on Rocky 8 servers and clients.<br class="">
The error only appears on the client, nothing is found on the servers around that time period.<br class="">
<br class="">
The errors mostly appear on our Intel ice lake based GPU nodes and less frequently on Intel ice lake based CPU nodes. We do not see the errors on our AMD Zen 3 nodes (the latter form the majority of our cluster).<br class="">
<br class="">
The problem was brought to our attention by a few users that were running Pytorch code on the GPU nodes, who complained about Pytorch giving an error about writing a file and then failing.<br class="">
When checking the log files the error appears to occur more often and I can't find a clear correlation with specific job types and neither with job failures (some jobs seem to continue to run after the error appears in the system log file).<br class="">
<br class="">
Has anyone seen this error before? Does somebody know how to fix this?<br class="">
<br class="">
Kind regards,<br class="">
<br class="">
Fokke Dijkstra<br class="">
<div class=""><br class="">
</div>
<span class="gmail_signature_prefix">-- </span><br class="">
<div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">
<div dir="ltr" class="">
<div class="">
<div dir="ltr" class="">Fokke Dijkstra <a href="mailto:f.dijkstra@rug.nl" target="_blank" class="">
<f.dijkstra@rug.nl></a> <br class="">
Team High Performance Computing</div>
<div dir="ltr" class="">Center for Information Technology, University of Groningen
<br class="">
Postbus 11044, 9700 CA Groningen, The Netherlands <br class="">
<br class="">
</div>
</div>
</div>
</div>
</div>
_______________________________________________<br class="">
lustre-discuss mailing list<br class="">
<a href="mailto:lustre-discuss@lists.lustre.org" class="">lustre-discuss@lists.lustre.org</a><br class="">
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org<br class="">
</div>
</blockquote>
</div>
<br class="">
<div class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div>Cheers, Andreas</div>
<div>--</div>
<div>Andreas Dilger</div>
<div>Lustre Principal Architect</div>
<div>Whamcloud</div>
<div><br class="">
</div>
<div><br class="">
</div>
<div><br class="">
</div>
</div>
</div>
</div>
</div>
</div>
<br class="Apple-interchange-newline">
</div>
<br class="Apple-interchange-newline">
<br class="Apple-interchange-newline">
</div>
<br class="">
</div>
</div>
</body>
</html>