[Lustre-discuss] Clients getting incorrect network information for one of two MDT servers (active/passive)

Daniel Kobras kobras at linux.de
Fri May 21 09:26:45 PDT 2010


Hi!

On Fri, May 21, 2010 at 11:54:56AM -0400, McKee, Shawn wrote:
> Parameters: 
> mgsnode=10.10.1.140 at tcp,192.41.230.140 at tcp1,141.211.101.161 at tcp2 
> failover.node=10.10.1.49 at tcp,192.41.230.49 at tcp1
> 
> Notice there is no reference to 192.41.230.48 at tcp anywhere here.   

Lustre MDS and OSS nodes register themselves with the MGS when they are started
(mounted) for the first time. In particular, the then-current list of network
ids is recorded and sent off to the MGS, from where it is propagated to all
clients. This information sticks and will not be updated automatically, even if
the configuration on the server changes. From your description, it sounds like
you initially started up the MDS with an incorrect LNET config (and probably
fixed it in the meantime, but the MGS and thus the clients won't know). Check
with "lctl list_nids" on your first MDS that you're content with the current
configuration, then follow the procedure to change a server nid ("writeconf
procedure") that is documented in the manual, and you should get both server
nodes operational again.

Regards,

Daniel.



More information about the lustre-discuss mailing list