[Lustre-discuss] Multihoned Problem, can mount o2ib but not tcp

Mike Hanby mhanby at uab.edu
Fri Oct 30 09:38:07 PDT 2009


No, I didn't thanks for pointing out --writeconf.

I reran the tunefs.lustre on each of the luns (after stopping the heartbeat service on each of the OSS nodes) using --writeconf:

tunefs.lustre --writeconf --erase-params \
 --param="failover.node=172.20.21.31 at o2ib" \
 --param="failover.node=172.20.21.32 at o2ib" \
 --param="mgsnode=172.20.21.30 at o2ib" \
 --param="failover.node=172.20.20.31 at tcp" \
 --param="failover.node=172.20.20.32 at tcp" \
 --param="mgsnode=172.20.20.30 at tcp" /dev/mpath/lun1

checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

   Read previous values:
Target:     lustre-OST0000
Index:      0
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: failover.node=172.20.21.31 at o2ib failover.node=172.20.21.32 at o2ib mgsnode=172.20.21.30 at o2ib


   Permanent disk data:
Target:     lustre-OST0000
Index:      0
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x542
              (OST update writeconf )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: failover.node=172.20.21.31 at o2ib failover.node=172.20.21.32 at o2ib mgsnode=172.20.21.30 at o2ib failover.node=172.20.20.31 at tcp failover.node=172.20.20.32 at tcp mgsnode=172.20.20.30 at tcp

Writing CONFIGS/mountdata

Following fresh boots of the OSSes and MDS and the tcp client, I'm still getting the same error:
kernel: LustreError: 2036:0:(events.c:460:ptlrpc_uuid_to_peer()) No NID found for 172.20.21.30 at o2ib 
kernel: LustreError: 2036:0:(client.c:69:ptlrpc_uuid_to_connection()) cannot find peer 172.20.21.30 at o2ib! 
kernel: LustreError: 2036:0:(ldlm_lib.c:329:client_obd_setup()) can't add initial connection 
kernel: LustreError: 2036:0:(obd_config.c:370:class_setup()) setup lustre-MDT0000-mdc-ffff81003f7c3400 failed (-2) 
kernel: LustreError: 2036:0:(obd_config.c:1197:class_config_llog_handler()) Err -2 on cfg command: 
kernel: LustreError: 15c-8: MGC172.20.20.30 at tcp: The configuration from log 'lustre-client' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. 
kernel: LustreError: 1959:0:(llite_lib.c:1171:ll_fill_super()) Unable to process log: -2 
kernel: LustreError: 1959:0:(obd_config.c:441:class_cleanup()) Device 2 not setup 
kernel: LustreError: 1959:0:(ldlm_request.c:1030:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway 
kernel: LustreError: 1959:0:(ldlm_request.c:1533:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 
kernel: LustreError: 1959:0:(obd_mount.c:1997:lustre_fill_super()) Unable to mount  (-2)

Do I need to run tunefs.lustre on the MGS node as well?

Thanks, Mike

-----Original Message-----
From: David Dillow [mailto:dillowda at ornl.gov] 
Sent: Thursday, October 29, 2009 8:32 PM
To: Mike Hanby
Cc: 'lustre-discuss at lists.lustre.org'
Subject: Re: [Lustre-discuss] Multihoned Problem, can mount o2ib but not tcp

On Thu, 2009-10-29 at 17:13 -0500, Mike Hanby wrote:
> I added the failover and mgsnode settings to each lun (6 luns) using the following:
> tunefs.lustre --failnode=172.20.20.31 at tcp --failnode=172.20.20.32 \
> --mgsnode=172.20.20.30 at tcp /dev/mpath/lun1

Did use --writeconf on the servers? You need to do so with LNET up with
the appropriate nids, so that clients (and the MDS) can find the servers
on both networks.

I think this is covered in the manual, so you should check there as
well. 
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office




More information about the lustre-discuss mailing list