[Lustre-discuss] question about failnode with mixed networks

Fri Nov 27 08:48:30 PST 2009

The Lustre networking model is that only a single connection will be used
between a client and server, as Lustre picks the "best" of the available 
options
and does not fall back to other options.

See https://bugzilla.lustre.org/show_bug.cgi?id=19854

Note that the client only registers one NID with the server, so while 
the client
can reconnect (with some patches and games played with the NIDs), the
server cannot reconnect to the client through a different path, so the 
client
could still be evicted if the server needs to eg, revoke a lock.

Kevin

Andreas Dilger wrote:
> On 2009-11-24, at 16:06, John White wrote:
>   
>> Okay, thanks Jeff.  This opens up another question... can you fail  
>> between protocols?
>> I have clients that have both o2ib and tcp connectivity to the  
>> servers, can I do:
>>
>> mount -t lustre mds0 at o2ib:mds1 at o2ib:mds0 at tcp0:mds1 at tcp0:/testfs /mnt/ 
>> testfs
>>
>> Will state be preserved between protocols or is this just entirely  
>> insane?
>>     
>
> Lustre can do this, but it isn't a normal config.  Note that in the  
> case of multiple interfaces for the same node there is a slightly  
> different syntax for the mount...  I _believe_ (though don't have the  
> info handy right now) that you separate NIDs for the same node with  
> commas, and different nodes with a colon, so you can try:
>
> mount -t lustre mds0 at o2ib,mds0 at tcp:mds1 at o2ib0,mds1 at tcp0:/test /mnt/test
>
> I'm not 100% sure of that.
>
> Note that this may double your client's failover times because it now  
> has 4 addresses to try when reconnecting, instead of 2.
>
>   
>> On Nov 24, 2009, at 1:54 PM, Jeffrey Bennett wrote:
>>
>>     
>>> Hi John,
>>>
>>> Yes, you can use multiple MGS, but you have to tell the OSTs, in  
>>> this way (example):
>>>
>>> mkfs.lustre --fsname testfs --ost --mgsnode=mds0 at tcp0 -- 
>>> mgsnode=mds1 at tcp0 /dev/sda
>>>
>>> Whenever you mount the filesystem, mount it this way:
>>>
>>> mount -t lustre mds0 at tcp0:mds1 at tcp0:/testfs /mnt/testfs
>>>
>>> Jeffrey A. Bennett
>>> HPC Systems Engineer
>>> San Diego Supercomputer Center
>>> http://users.sdsc.edu/~jab
>>>
>>> -----Original Message-----
>>> From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org 
>>> ] On Behalf Of John White
>>> Sent: Tuesday, November 24, 2009 1:20 PM
>>> To: Brian J. Murrell
>>> Cc: lustre-discuss at lists.lustre.org
>>> Subject: Re: [Lustre-discuss] question about failnode with mixed  
>>> networks
>>>
>>> Excellent, thanks for the replies.  One more question:
>>> Is there a --failnode corollary for MGTs...?  Does lustre support  
>>> MGT/S failover?
>>>
>>> On Nov 16, 2009, at 6:51 AM, Brian J. Murrell wrote:
>>>
>>>       
>>>> On Fri, 2009-11-13 at 14:34 -0800, John White wrote:
>>>>         
>>>>> In a failover situation, it would appear that tcp connected  
>>>>> clients do not get the hint to switch over to the secondary MDS
>>>>>           
>>>> Clients don't (yet) get "hints" to switch servers.  Clients  
>>>> continue to
>>>> use a server until they don't get a response, at which time they  
>>>> cycle
>>>> through their list of NIDs for the unresponsive service.
>>>>
>>>>         
>>>>> When I initially set up the file system, I specified --failnode  
>>>>> for the @o2ib interfaces,
>>>>>           
>>>> Only the @o2ib interfaces?
>>>>
>>>>         
>>>>> should I have also specified NIDs for the @tcp0 during the fs  
>>>>> construction?
>>>>>           
>>>> Yes.  You specify the NIDs for all servers that should be  
>>>> considered for
>>>> that service.
>>>>
>>>>         
>>>>> If so, is it possible to add this as an afterthought?
>>>>>           
>>>> You want tunefs.lustre.
>>>>
>>>> b.
>>>>
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>         
>>> ----------------
>>> John White
>>> High Performance Computing Services (HPCS)
>>> (510) 486-7307
>>> One Cyclotron Rd, MS: 50B-3209C
>>> Lawrence Berkeley National Lab
>>> Berkeley, CA 94720
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>       
>> ----------------
>> John White
>> High Performance Computing Services (HPCS)
>> (510) 486-7307
>> One Cyclotron Rd, MS: 50B-3209C
>> Lawrence Berkeley National Lab
>> Berkeley, CA 94720
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>     
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>