[Lustre-discuss] Multihoned Problem, can mount o2ib but not tcp

Mike Hanby mhanby at uab.edu
Fri Oct 30 11:30:59 PDT 2009


So, assuming the MGS is /dev/loop0 on my mds server, something like this (without the --dryrun), also, I'll need to umount /dev/loop0 first, correct?:

tunefs.lustre --dryrun --writeconf --erase-params \
 --param="failover.node=172.20.21.30 at o2ib" \
 --param="failover.node=172.20.20.30 at tcp" \
 --param="mdt.group_upcall=NONE" /dev/loop0

Reading CONFIGS/mountdata

   Read previous values:
Target:     lustre-MDT0000
Index:      0
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x405
              (MDT MGS )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: failover.node=172.20.21.30 at o2ib mdt.group_upcall=NONE


   Permanent disk data:
Target:     lustre-MDT0000
Index:      0
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x545
              (MDT MGS update writeconf )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: failover.node=172.20.21.30 at o2ib failover.node=172.20.20.30 at tcp mdt.group_upcall=NONE

exiting before disk write.

-----Original Message-----
From: Andreas.Dilger at sun.com [mailto:Andreas.Dilger at sun.com] On Behalf Of Andreas Dilger
Sent: Friday, October 30, 2009 1:18 PM
To: Mike Hanby
Cc: 'David Dillow'; 'lustre-discuss at lists.lustre.org'
Subject: Re: [Lustre-discuss] Multihoned Problem, can mount o2ib but not tcp

On 2009-10-30, at 10:38, Mike Hanby wrote:
> kernel: LustreError: 15c-8: MGC172.20.20.30 at tcp: The configuration  
> from log 'lustre-client' failed (-2). This may be the result of  
> communication errors between this node and the MGS, a bad  
> configuration, or other errors. See the syslog for more information.
>
> Do I need to run tunefs.lustre on the MGS node as well?

Yes, since you specified multiple NIDs for the MGS, the MGS itself  
needs to
know to accept connections on that interface.

> Thanks, Mike
>
> -----Original Message-----
> From: David Dillow [mailto:dillowda at ornl.gov]
> Sent: Thursday, October 29, 2009 8:32 PM
> To: Mike Hanby
> Cc: 'lustre-discuss at lists.lustre.org'
> Subject: Re: [Lustre-discuss] Multihoned Problem, can mount o2ib but  
> not tcp
>
> On Thu, 2009-10-29 at 17:13 -0500, Mike Hanby wrote:
>> I added the failover and mgsnode settings to each lun (6 luns)  
>> using the following:
>> tunefs.lustre --failnode=172.20.20.31 at tcp --failnode=172.20.20.32 \
>> --mgsnode=172.20.20.30 at tcp /dev/mpath/lun1
>
> Did use --writeconf on the servers? You need to do so with LNET up  
> with
> the appropriate nids, so that clients (and the MDS) can find the  
> servers
> on both networks.
>
> I think this is covered in the manual, so you should check there as
> well.
> -- 
> Dave Dillow
> National Center for Computational Science
> Oak Ridge National Laboratory
> (865) 241-6602 office
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list