[lustre-discuss] Lustre server still try to recover the lnet reply to the depreciated clients

Andreas Dilger adilger at whamcloud.com
Fri Dec 8 15:49:04 PST 2023


If you are evicting a client by NID, then use the "nid:" keyword:

    lctl set_param mdt.*.evict_client=nid:10.68.178.25 at tcp

Otherwise it is expecting the input to be in the form of a client UUID (to allow
evicting a single export from a client mounting the filesystem multiple times).

That said, the client *should* be evicted by the server automatically, so it isn't
clear why this isn't happening.  Possibly this is something at the LNet level
(which unfortunately I don't know much about)? 

Cheers, Andreas

> On Dec 6, 2023, at 13:23, Huang, Qiulan via lustre-discuss <lustre-discuss at lists.lustre.org> wrote:
> 
> 
> 
> Hello all,
> 
> 
> We removed some clients two weeks ago but we see the Lustre server is still trying to handle the lnet recovery reply to those clients (the error log is posted as below). And they are still listed in the exports dir.
> 
> 
> I tried to run  to evict the clients but failed with  the error "no exports found"
> 
> lctl set_param mdt.*.evict_client=10.68.178.25 at tcp
> 
> 
> Do you know how to clean up the removed the depreciated clients? Any suggestions would be greatly appreciated.
> 
> 
> 
> For example:
> 
> [root at mds2 ~]# ll /proc/fs/lustre/mdt/data-MDT0000/exports/10.67.178.25 at tcp/
> total 0
> -r--r--r-- 1 root root 0 Dec  5 15:41 export
> -r--r--r-- 1 root root 0 Dec  5 15:41 fmd_count
> -r--r--r-- 1 root root 0 Dec  5 15:41 hash
> -rw-r--r-- 1 root root 0 Dec  5 15:41 ldlm_stats
> -r--r--r-- 1 root root 0 Dec  5 15:41 nodemap
> -r--r--r-- 1 root root 0 Dec  5 15:41 open_files
> -r--r--r-- 1 root root 0 Dec  5 15:41 reply_data
> -rw-r--r-- 1 root root 0 Aug 14 10:58 stats
> -r--r--r-- 1 root root 0 Dec  5 15:41 uuid
> 
> 
> 
> 
> 
> /var/log/messages:Dec  6 12:50:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message
> /var/log/messages:Dec  6 13:05:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.178.25 at tcp) recovery failed with -110
> /var/log/messages:Dec  6 13:05:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message
> /var/log/messages:Dec  6 13:20:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.178.25 at tcp) recovery failed with -110
> /var/log/messages:Dec  6 13:20:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message
> /var/log/messages:Dec  6 13:35:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.178.25 at tcp) recovery failed with -110
> /var/log/messages:Dec  6 13:35:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message
> /var/log/messages:Dec  6 13:50:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.178.25 at tcp) recovery failed with -110
> /var/log/messages:Dec  6 13:50:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message
> /var/log/messages:Dec  6 14:05:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.178.25 at tcp) recovery failed with -110
> /var/log/messages:Dec  6 14:05:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message
> /var/log/messages:Dec  6 14:20:16 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.178.25 at tcp) recovery failed with -110
> /var/log/messages:Dec  6 14:20:16 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message
> /var/log/messages:Dec  6 14:30:17 mds2 kernel: LNetError: 3806712:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.176.25 at tcp) recovery failed with -111
> /var/log/messages:Dec  6 14:30:17 mds2 kernel: LNetError: 3806712:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 3 previous similar messages
> /var/log/messages:Dec  6 14:47:14 mds2 kernel: LNetError: 3812070:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.176.25 at tcp) recovery failed with -111
> /var/log/messages:Dec  6 14:47:14 mds2 kernel: LNetError: 3812070:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 8 previous similar messages
> /var/log/messages:Dec  6 15:02:14 mds2 kernel: LNetError: 3817248:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.176.25 at tcp) recovery failed with -111
> 
> 
> Regards,
> Qiulan
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud









More information about the lustre-discuss mailing list