[lustre-discuss] Lustre server still try to recover the lnet reply to the depreciated clients

Huang, Qiulan qhuang at bnl.gov
Wed Dec 6 12:23:11 PST 2023



Hello all,


We removed some clients two weeks ago but we see the Lustre server is still trying to handle the lnet recovery reply to those clients (the error log is posted as below). And they are still listed in the exports dir.


I tried to run  to evict the clients but failed with  the error "no exports found"

lctl set_param mdt.*.evict_client=10.68.178.25 at tcp


Do you know how to clean up the removed the depreciated clients? Any suggestions would be greatly appreciated.



For example:


[root at mds2 ~]# ll /proc/fs/lustre/mdt/data-MDT0000/exports/10.67.178.25 at tcp/

total 0

-r--r--r-- 1 root root 0 Dec  5 15:41 export

-r--r--r-- 1 root root 0 Dec  5 15:41 fmd_count

-r--r--r-- 1 root root 0 Dec  5 15:41 hash

-rw-r--r-- 1 root root 0 Dec  5 15:41 ldlm_stats

-r--r--r-- 1 root root 0 Dec  5 15:41 nodemap

-r--r--r-- 1 root root 0 Dec  5 15:41 open_files

-r--r--r-- 1 root root 0 Dec  5 15:41 reply_data

-rw-r--r-- 1 root root 0 Aug 14 10:58 stats

-r--r--r-- 1 root root 0 Dec  5 15:41 uuid






/var/log/messages:Dec  6 12:50:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message

/var/log/messages:Dec  6 13:05:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.178.25 at tcp) recovery failed with -110

/var/log/messages:Dec  6 13:05:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message

/var/log/messages:Dec  6 13:20:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.178.25 at tcp) recovery failed with -110

/var/log/messages:Dec  6 13:20:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message

/var/log/messages:Dec  6 13:35:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.178.25 at tcp) recovery failed with -110

/var/log/messages:Dec  6 13:35:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message

/var/log/messages:Dec  6 13:50:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.178.25 at tcp) recovery failed with -110

/var/log/messages:Dec  6 13:50:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message

/var/log/messages:Dec  6 14:05:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.178.25 at tcp) recovery failed with -110

/var/log/messages:Dec  6 14:05:17 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message

/var/log/messages:Dec  6 14:20:16 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.178.25 at tcp) recovery failed with -110

/var/log/messages:Dec  6 14:20:16 mds2 kernel: LNetError: 11579:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message

/var/log/messages:Dec  6 14:30:17 mds2 kernel: LNetError: 3806712:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.176.25 at tcp) recovery failed with -111

/var/log/messages:Dec  6 14:30:17 mds2 kernel: LNetError: 3806712:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 3 previous similar messages

/var/log/messages:Dec  6 14:47:14 mds2 kernel: LNetError: 3812070:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.176.25 at tcp) recovery failed with -111

/var/log/messages:Dec  6 14:47:14 mds2 kernel: LNetError: 3812070:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 8 previous similar messages

/var/log/messages:Dec  6 15:02:14 mds2 kernel: LNetError: 3817248:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.67.176.25 at tcp) recovery failed with -111


Regards,
Qiulan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20231206/89b7c124/attachment-0001.htm>


More information about the lustre-discuss mailing list