<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Good Morning,</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
We've been experiencing a fairly nasty issue with our clients following our move to Alma 9. It seems to occur randomly (a few days to over a week), the clients with connectX-3 cards start getting lnet network errors and seeing moving hangs on random osts
spread across our oss systems, as well as issues talking with the mgs. This can then trigger crash cycles on the oss systems themselves (again in the lnet layer). The only answer we have found so far is to power down all the impacted clients and let the
impacted oss systems reboot.</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Here is a snippet of the error as we see it on the client:<br>
Jun21 08:16] Lustre: lustre19-OST0020-osc-ffff934c22a29800: Connection restored to 172.17.0.97@o2ib (at 172.17.0.97@o2ib) </div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
[ +0.000006] Lustre: Skipped 2 previous similar messages</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
[ +3.079695] Lustre: lustre19-MDT0000-mdc-ffff934c22a29800: Connection restored to 172.17.0.37@o2ib (at 172.17.0.37@o2ib) </div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
[ +0.223480] LustreError: 4478:0:(events.c:211:client_bulk_callback()) event type 2, status -5, desc 00000000784c6e4f</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
[ +0.000007] LustreError: 4478:0:(events.c:211:client_bulk_callback()) Skipped 3 previous similar messages</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
[ +22.955501] Lustre: 3935794:0:(client.c:2289:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1718972176/real 1718972176] req@000000008c377199 x1801581392820160/t0(0) o13->lustre24-OST0006-osc-ffff934b8f4a7000@172.17.1.42@o2ib:7/4
lens 224/368 e 0 to 1 dl 1718972183 ref 2 fl Rpc:eXQr/0/ffffffff rc 0/-1 job:'lfs.7953'</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
[ +0.000006] Lustre: 3935794:0:(client.c:2289:ptlrpc_expire_one_request()) Skipped 21 previous similar messages</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
[ +20.333921] Lustre: lustre19-OST000a-osc-ffff934c22a29800: Connection restored to 172.17.0.39@o2ib (at 172.17.0.39@o2ib) </div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
[Jun21 08:17] LustreError: 166-1: MGC172.17.0.36@o2ib: Connection to MGS (at 172.17.0.37@o2ib) was lost; in progress operations using this service will fail </div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
[ +0.000302] Lustre: lustre19-OST0046-osc-ffff934c22a29800: Connection to lustre19-OST0046 (at 172.17.0.103@o2ib) was lost; in progress operations using this service will wait for recovery to complete </div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
[ +0.000005] Lustre: Skipped 6 previous similar messages</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
[ +6.144196] Lustre: MGC172.17.0.36@o2ib: Connection restored to 172.17.0.37@o2ib (at 172.17.0.37@o2ib)</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
[ +0.000006] Lustre: Skipped 1 previous similar message</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
We have a mix of client hardware, but the systems are uniform in their kernels and lustre clients.</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Here are the software versions:<br>
kernel-modules-core-5.14.0-362.24.1.el9_3.x86_64</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
kernel-core-5.14.0-362.24.1.el9_3.x86_64</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
kernel-modules-5.14.0-362.24.1.el9_3.x86_64</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
kernel-5.14.0-362.24.1.el9_3.x86_64</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
texlive-l3kernel-20200406-26.el9_2.noarch</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
kernel-modules-core-5.14.0-362.24.2.el9_3.x86_64</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
kernel-core-5.14.0-362.24.2.el9_3.x86_64</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
kernel-modules-5.14.0-362.24.2.el9_3.x86_64</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
kernel-tools-libs-5.14.0-362.24.2.el9_3.x86_64</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
kernel-tools-5.14.0-362.24.2.el9_3.x86_64</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
kernel-5.14.0-362.24.2.el9_3.x86_64</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
kernel-headers-5.14.0-362.24.2.el9_3.x86_64</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
and lustre:</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
kmod-lustre-client-2.15.4-1.el9.jlab.x86_64</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
lustre-client-2.15.4-1.el9.jlab.x86_64</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Our oss systems are running el7, are running MOFED for their infiniband stack, and have ConnectX-3 cards</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
kernel-tools-libs-3.10.0-1160.76.1.el7.x86_64</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
kernel-tools-3.10.0-1160.76.1.el7.x86_64</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
kernel-headers-3.10.0-1160.76.1.el7.x86_64</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
kernel-abi-whitelists-3.10.0-1160.76.1.el7.noarch</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
kernel-devel-3.10.0-1160.76.1.el7.x86_64</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
kernel-3.10.0-1160.76.1.el7.x86_64</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
and lustre version</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
lustre-2.12.9-1.el7.x86_64</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
kmod-lustre-osd-zfs-2.12.9-1.el7.x86_64</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
lustre-osd-zfs-mount-2.12.9-1.el7.x86_64</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
lustre-resource-agents-2.12.9-1.el7.x86_64</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
kmod-lustre-2.12.9-1.el7.x86_64</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
w/r,</div>
<div id="Signature" style="color: inherit;">
<div style="font-size: 12pt; color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif;" dir="ltr" id="divtagdefaultwrapper">
<p style="margin-top: 0px; margin-bottom: 0px;"><span style="font-family: monospace; font-size: 14.16px; color: rgb(51, 51, 51);">Kurt J. Strosahl (he/him)</span><br>
<span style="font-family: monospace; font-size: 14.16px; color: rgb(51, 51, 51);">System Administrator: Lustre, HPC</span><br>
<span style="font-family: monospace; font-size: 14.16px; color: rgb(51, 51, 51);">Scientific Computing Group, Thomas Jefferson National Accelerator Facility</span><br>
</p>
</div>
</div>
</body>
</html>