[Lustre-discuss] Expected number of clients during MDS reconnect.

Daniel Kobras kobras at linux.de
Thu Oct 15 09:46:20 PDT 2009


Hi!

When initiating an MDS failover on one of our systems, we see the new active
MDS expecting more clients to recover than were actually connected before.

	# cat /proc/fs/lustre/mds/lustrefs-MDT0000/recovery_status 
	status: COMPLETE
	recovery_start: 1255622509
	recovery_duration: 300
	delayed_clients: 0/651
	completed_clients: 260/651
	replayed_requests: 4
	last_transno: 4112137365

Where 260 is indeed the correct number of active clients.

	# ls -d1 /proc/fs/lustre/mds/lustrefs-MDT0000/exports/*@* | wc -l
	260
	# cat /proc/fs/lustre/mds/lustrefs-MDT0000/num_exports 
	261

Not sure what caues the off-by-one between num_exports and the number of
entries in the exports subdirectory, but the difference doesn't look severe.  I
do wonder about the expected number of 651 clients, though. When recovery has
finished on the MDS, Lustre correctly evicts those surplus clients, it seems,
as the syslog reports

	Lustre: lustrefs-MDT0000: Recovery period over after 5:00, of 651
	clients 260 recovered and 391 were evicted.

but still the MDT apparently keeps note of them and expects them back during
the next recovery cycle. Which means that currently we always have to wait the
full recovery timespan even though all active clients have reconnected already.
We've seen this behaviour with MDSes running 1.6.7.2 and 1.8.1, clients run a
mixture of versions between 1.6.6 and 1.8.1. During the lifetime of the system,
we've only decommissioned a small number of systems running Lustre clients, so
the difference between current and expected client numbers must have developped
by some other means.

Does anyone know how the MDT calculates the number of expected clients?
Is there a way to make Lustre dump a list of nids of the surplus clients it
evicts after the recovery phase?
And above all, is there a way to convince the MDT about the true number of
clients (preferrably one that doesn't involve the writeconf dance ;-)?

Regards,

Daniel.




More information about the lustre-discuss mailing list