[Lustre-discuss] failover problems

Andreas Dilger adilger at sun.com
Fri Dec 11 23:15:12 PST 2009


On 2009-12-11, at 17:48, John White wrote:
> Please disregard.  I just realized the difference between a ':' and  
> ',' when running these commands.

If this isn't explained clearly in the documentation, please speak up  
and it will be fixed.

> On Dec 11, 2009, at 11:42 AM, John White wrote:
>
>> So we have a cluster with an MGT and 2 MDTs.  Each has an NID on  
>> o2ib and tcp and are dual-connected to 2 MDSs.  We created the MGT  
>> and MDTs with the following commands:
>> mkfs.lustre --mgs --reformat --failnode=10.4.200.0 at o2ib, 
>> 10.4.200.1 at o2ib --failnode=10.4.200.0 at tcp0,10.4.200.1 at tcp0 /dev/dm-0
>> mkfs.lustre --mdt --mgsnode=10.4.200.0 at o2ib --fsname=lrc --reformat  
>> --failnode=10.4.200.0 at o2ib,10.4.200.1 at o2ib, 
>> 10.4.200.0 at tcp0,10.4.200.1 at tcp0 /dev/dm-1
>> mkfs.lustre --mdt --mgsnode=10.4.200.0 at o2ib --fsname=nano -- 
>> reformat --failnode=10.4.200.1 at o2ib,10.4.200.0 at o2ib, 
>> 10.4.200.1 at tcp0,10.4.200.0 at tcp0 /dev/dm-2
>>
>> The host cluster starts and mounts the luns just fine.  I mount TCP  
>> connected clients with both MGSs called out.  The client fails over  
>> to the secondary MDS/MGT just fine but keeps failing on the MDT.   
>> It just keeps trying the old MDS NIDs:
>> Lustre: Changing connection for lrc-MDT0000-mdc-ffff8101d57ad400 to  
>> 10.4.200.0 at o2ib/10.0.200.0 at tcp
>>
>> Ideas?
>> ----------------
>> John White
>> High Performance Computing Services (HPCS)
>> (510) 486-7307
>> One Cyclotron Rd, MS: 50B-3209C
>> Lawrence Berkeley National Lab
>> Berkeley, CA 94720
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
> ----------------
> John White
> High Performance Computing Services (HPCS)
> (510) 486-7307
> One Cyclotron Rd, MS: 50B-3209C
> Lawrence Berkeley National Lab
> Berkeley, CA 94720
>
>
>
>
>
>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list