<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Hello Andreas,</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Thanks for your reply and tips.</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">We found this case was c</span><span style="font-family: Arial, sans-serif; font-size: 11pt; line-height: 1.38; color: rgb(0, 0, 0);">aused
by removing Lustre modules(uninstall Lustre rpms) without unmount Lustre instance. It means there are no any notifications to Lustre servers, and servers tried to recovery the connection again and again.</span></div>
<div class="elementToProof"><span style="font-family: Arial, sans-serif; font-size: 11pt; line-height: 1.38; color: rgb(0, 0, 0);"><br>
</span></div>
<div class="elementToProof"><span style="font-family: Arial, sans-serif; font-size: 11pt; line-height: 1.38; color: rgb(0, 0, 0);">The good thing is that LNetError stopped after I run the following command to remove the export.
</span><span style="letter-spacing: normal; font-family: Arial, sans-serif; font-size: 14.6667px; line-height: 1.38; font-weight: 400; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);"> I don't know is there any other better way to clean up the removed
clients. Disconnect in LNET level? </span></div>
<div class="elementToProof"><span style="font-family: Arial, sans-serif; font-size: 11pt; line-height: 1.38; color: rgb(0, 0, 0);"><br>
</span></div>
<div class="elementToProof"><span style="font-family: Menlo; font-size: 14px; line-height: normal; color: rgb(0, 0, 0);">[root@mds2 ~]#
</span><span style="font-family: Menlo; font-size: 14px; line-height: normal; color: rgb(180, 36, 25);"><b>echo</b></span><span style="font-family: Arial, sans-serif; font-size: 11pt; line-height: 1.38; color: rgb(0, 0, 0);">
</span><span style="font-family: Menlo; font-size: 14px; line-height: normal; color: rgb(0, 0, 0);">"10.67.178.25@tcp" > /proc/fs/lustre/mdt/data-MDT0000/exports/clear</span></div>
<div class="elementToProof"><span style="font-family: Menlo; font-size: 14px; line-height: normal; color: rgb(0, 0, 0);"><br>
</span></div>
<div class="elementToProof"><span style="font-family: Calibri, Helvetica, sans-serif; font-size: 14px; line-height: normal; color: rgb(0, 0, 0);">Thank you.</span></div>
<div class="elementToProof"><span style="font-family: Calibri, Helvetica, sans-serif; font-size: 14px; line-height: normal; color: rgb(0, 0, 0);"><br>
</span></div>
<div class="elementToProof"><span style="font-family: Calibri, Helvetica, sans-serif; font-size: 14px; line-height: normal; color: rgb(0, 0, 0);">Regards,</span></div>
<div class="elementToProof"><span style="font-family: Calibri, Helvetica, sans-serif; font-size: 14px; line-height: normal; color: rgb(0, 0, 0);">Qiulan</span></div>
<div class="elementToProof"><span style="font-family: Menlo; font-size: 14px; line-height: normal; color: rgb(0, 0, 0);"><br>
</span></div>
<div class="elementToProof"><span style="font-family: Menlo; font-size: 14px; line-height: normal; color: rgb(0, 0, 0);"><br>
</span></div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Andreas Dilger <adilger@whamcloud.com><br>
<b>Sent:</b> Friday, December 8, 2023 6:49 PM<br>
<b>To:</b> Huang, Qiulan <qhuang@bnl.gov><br>
<b>Cc:</b> lustre-discuss@lists.lustre.org <lustre-discuss@lists.lustre.org><br>
<b>Subject:</b> Re: [lustre-discuss] Lustre server still try to recover the lnet reply to the depreciated clients</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">If you are evicting a client by NID, then use the "nid:" keyword:<br>
<br>
lctl set_param mdt.*.evict_client=nid:10.68.178.25@tcp<br>
<br>
Otherwise it is expecting the input to be in the form of a client UUID (to allow<br>
evicting a single export from a client mounting the filesystem multiple times).<br>
<br>
That said, the client *should* be evicted by the server automatically, so it isn't<br>
clear why this isn't happening. Possibly this is something at the LNet level<br>
(which unfortunately I don't know much about)? <br>
<br>
Cheers, Andreas<br>
<br>
> On Dec 6, 2023, at 13:23, Huang, Qiulan via lustre-discuss <lustre-discuss@lists.lustre.org> wrote:<br>
> <br>
> <br>
> <br>
> Hello all,<br>
> <br>
> <br>
> We removed some clients two weeks ago but we see the Lustre server is still trying to handle the lnet recovery reply to those clients (the error log is posted as below). And they are still listed in the exports dir.<br>
> <br>
> <br>
> I tried to run to evict the clients but failed with the error "no exports found"<br>
> <br>
> lctl set_param mdt.*.evict_client=10.68.178.25@tcp<br>
> <br>
> <br>
> Do you know how to clean up the removed the depreciated clients? Any suggestions would be greatly appreciated.<br>
> <br>
> <br>
> <br>
> For example:<br>
> <br>
> [root@mds2 ~]# ll /proc/fs/lustre/mdt/data-MDT0000/exports/10.67.178.25@tcp/<br>
> total 0<br>
> -r--r--r-- 1 root root 0 Dec 5 15:41 export<br>
> -r--r--r-- 1 root root 0 Dec 5 15:41 fmd_count<br>
> -r--r--r-- 1 root root 0 Dec 5 15:41 hash<br>
> -rw-r--r-- 1 root root 0 Dec 5 15:41 ldlm_stats<br>
> -r--r--r-- 1 root root 0 Dec 5 15:41 nodemap<br>
> -r--r--r-- 1 root root 0 Dec 5 15:41 open_files<br>
> -r--r--r-- 1 root root 0 Dec 5 15:41 reply_data<br>
> -rw-r--r-- 1 root root 0 Aug 14 10:58 stats<br>
> -r--r--r-- 1 root root 0 Dec 5 15:41 uuid<br>
> <br>
> <br>
> <br>
> <br>
> <br>
> /var/log/messages:Dec 6 12:50:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message<br>
> /var/log/messages:Dec 6 13:05:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.178.25@tcp) recovery failed with -110<br>
> /var/log/messages:Dec 6 13:05:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message<br>
> /var/log/messages:Dec 6 13:20:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.178.25@tcp) recovery failed with -110<br>
> /var/log/messages:Dec 6 13:20:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message<br>
> /var/log/messages:Dec 6 13:35:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.178.25@tcp) recovery failed with -110<br>
> /var/log/messages:Dec 6 13:35:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message<br>
> /var/log/messages:Dec 6 13:50:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.178.25@tcp) recovery failed with -110<br>
> /var/log/messages:Dec 6 13:50:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message<br>
> /var/log/messages:Dec 6 14:05:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.178.25@tcp) recovery failed with -110<br>
> /var/log/messages:Dec 6 14:05:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message<br>
> /var/log/messages:Dec 6 14:20:16 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.178.25@tcp) recovery failed with -110<br>
> /var/log/messages:Dec 6 14:20:16 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message<br>
> /var/log/messages:Dec 6 14:30:17 mds2 kernel: LNetError: 3806712:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.176.25@tcp) recovery failed with -111<br>
> /var/log/messages:Dec 6 14:30:17 mds2 kernel: LNetError: 3806712:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 3 previous similar messages<br>
> /var/log/messages:Dec 6 14:47:14 mds2 kernel: LNetError: 3812070:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.176.25@tcp) recovery failed with -111<br>
> /var/log/messages:Dec 6 14:47:14 mds2 kernel: LNetError: 3812070:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 8 previous similar messages<br>
> /var/log/messages:Dec 6 15:02:14 mds2 kernel: LNetError: 3817248:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.176.25@tcp) recovery failed with -111<br>
> <br>
> <br>
> Regards,<br>
> Qiulan<br>
> _______________________________________________<br>
> lustre-discuss mailing list<br>
> lustre-discuss@lists.lustre.org<br>
> <a href="https://urldefense.com/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!P4SdNyxKAPE!CUXLBOOw5KZoyNO5v4zxJoWzkgbz9boeSUQlOzVppwOEbfbxfCnnuHjbvn_gZ1toVmKpWTNRHdF8eMm9hCw$">
https://urldefense.com/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!P4SdNyxKAPE!CUXLBOOw5KZoyNO5v4zxJoWzkgbz9boeSUQlOzVppwOEbfbxfCnnuHjbvn_gZ1toVmKpWTNRHdF8eMm9hCw$</a>
<br>
<br>
Cheers, Andreas<br>
--<br>
Andreas Dilger<br>
Lustre Principal Architect<br>
Whamcloud<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
</div>
</span></font></div>
</body>
</html>