[Lustre-discuss] Multihoned Problem, can mount o2ib but not tcp

Mike Hanby mhanby at uab.edu
Fri Oct 30 14:16:49 PDT 2009


Btw, subject should read "Multihomed..." I always mistype that for some reason.

Anyhow, both networks are now working. Thanks to both for the clues, writeconf and adding to the mgs.

Mike

-----Original Message-----
From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Mike Hanby
Sent: Friday, October 30, 2009 1:31 PM
To: 'lustre-discuss at lists.lustre.org'
Subject: Re: [Lustre-discuss] Multihoned Problem, can mount o2ib but not tcp

So, assuming the MGS is /dev/loop0 on my mds server, something like this (without the --dryrun), also, I'll need to umount /dev/loop0 first, correct?:

tunefs.lustre --dryrun --writeconf --erase-params \
 --param="failover.node=172.20.21.30 at o2ib" \
 --param="failover.node=172.20.20.30 at tcp" \
 --param="mdt.group_upcall=NONE" /dev/loop0

Reading CONFIGS/mountdata

   Read previous values:
Target:     lustre-MDT0000
Index:      0
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x405
              (MDT MGS )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: failover.node=172.20.21.30 at o2ib mdt.group_upcall=NONE


   Permanent disk data:
Target:     lustre-MDT0000
Index:      0
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x545
              (MDT MGS update writeconf )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: failover.node=172.20.21.30 at o2ib failover.node=172.20.20.30 at tcp mdt.group_upcall=NONE

exiting before disk write.

-----Original Message-----
From: Andreas.Dilger at sun.com [mailto:Andreas.Dilger at sun.com] On Behalf Of Andreas Dilger
Sent: Friday, October 30, 2009 1:18 PM
To: Mike Hanby
Cc: 'David Dillow'; 'lustre-discuss at lists.lustre.org'
Subject: Re: [Lustre-discuss] Multihoned Problem, can mount o2ib but not tcp

On 2009-10-30, at 10:38, Mike Hanby wrote:
> kernel: LustreError: 15c-8: MGC172.20.20.30 at tcp: The configuration  
> from log 'lustre-client' failed (-2). This may be the result of  
> communication errors between this node and the MGS, a bad  
> configuration, or other errors. See the syslog for more information.
>
> Do I need to run tunefs.lustre on the MGS node as well?

Yes, since you specified multiple NIDs for the MGS, the MGS itself  
needs to
know to accept connections on that interface.

> Thanks, Mike
>
> -----Original Message-----
> From: David Dillow [mailto:dillowda at ornl.gov]
> Sent: Thursday, October 29, 2009 8:32 PM
> To: Mike Hanby
> Cc: 'lustre-discuss at lists.lustre.org'
> Subject: Re: [Lustre-discuss] Multihoned Problem, can mount o2ib but  
> not tcp
>
> On Thu, 2009-10-29 at 17:13 -0500, Mike Hanby wrote:
>> I added the failover and mgsnode settings to each lun (6 luns)  
>> using the following:
>> tunefs.lustre --failnode=172.20.20.31 at tcp --failnode=172.20.20.32 \
>> --mgsnode=172.20.20.30 at tcp /dev/mpath/lun1
>
> Did use --writeconf on the servers? You need to do so with LNET up  
> with
> the appropriate nids, so that clients (and the MDS) can find the  
> servers
> on both networks.
>
> I think this is covered in the manual, so you should check there as
> well.
> -- 
> Dave Dillow
> National Center for Computational Science
> Oak Ridge National Laboratory
> (865) 241-6602 office
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss



More information about the lustre-discuss mailing list