[Lustre-discuss] Added Dual-homed OSS; ethernet clients confused

Chris Worley worleys at gmail.com
Tue Apr 22 16:35:46 PDT 2008


On Tue, Apr 22, 2008 at 5:01 PM, Cliff White <Cliff.White at sun.com> wrote:
>  You don't need to reformat or rebuild. The new OST registers with the
>  MGS on first startup, and since it didn't know about the TCP address is
>  only registered as IB. You need to regenerate the config, which can be
>  done with 'tunefs.lustre --writeconf' on the OSS providing the new OST.

I unmounted everything lustre from clients and servers.  I didn't
unload any modules.

On the OSS in question, for each OST, I did:

# tunefs.lustre --writeconf --ost
--mgsnode="36.102.29.1 at o2ib0,36.101.29.1 at tcp0" --fsname=lfs --param
sys.timeout=40 --param lov.stripesize=2M /dev/sdl

checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

   Read previous values:
Target:     lfs-OST0030
Index:      48
Lustre FS:  lfs
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=36.102.29.1 at o2ib,36.101.29.1 at tcp sys.timeout=40
lov.stripesize=2M


   Permanent disk data:
Target:     lfs-OST0030
Index:      48
Lustre FS:  lfs
Mount type: ldiskfs
Flags:      0x142
              (OST update writeconf )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=36.102.29.1 at o2ib,36.101.29.1 at tcp sys.timeout=40
lov.stripesize=2M mgsnode=36.102.29.1 at o2ib,36.101.29.1 at tcp
sys.timeout=40 lov.stripesize=2M

I remounted all the OST's and MDT, then tried the Ethernet-only client
mount.  Still, the same error:

# mount -t lustre 36.101.29.1 at tcp:/lfs /lfs
mount.lustre: mount 36.101.29.1 at tcp:/lfs at /lfs failed: No such file
or directory
Is the MGS specification correct?
Is the filesystem name correct?
If upgrading, is the copied client log valid? (see upgrade docs)
# dmesg
Lustre: OBD class driver, info at clusterfs.com
        Lustre Version: 1.6.4.2
        Build Version:
1.6.4.2-19691231190000-PRISTINE-.usr.src.linux-2.6.9-67.0.4.EL-Lustre-1.6.4.2
Lustre: Added LNI 36.101.255.10 at tcp [8/256]
Lustre: Accept secure, port 988
Lustre: Lustre Client File System; info at clusterfs.com
Lustre: Binding irq 177 to CPU 0 with cmd: echo 1 > /proc/irq/177/smp_affinity
Lustre: lfs-clilov-000001076591e400.lov: set parameter stripesize=4194304
LustreError: 6934:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
for 36.102.29.4 at o2ib
LustreError: 6934:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
find peer 36.102.29.4 at o2ib!
LustreError: 6934:0:(ldlm_lib.c:312:client_obd_setup()) can't add
initial connection
LustreError: 7045:0:(connection.c:142:ptlrpc_put_connection()) NULL connection
LustreError: 6934:0:(obd_config.c:325:class_setup()) setup
lfs-OST0026-osc-000001076591e400 failed (-2)
LustreError: 6934:0:(obd_config.c:1062:class_config_llog_handler())
Err -2 on cfg command:
Lustre:    cmd=cf003 0:lfs-OST0026-osc  1:lfs-OST0026_UUID  2:36.102.29.4 at o2ib
LustreError: 15c-8: MGC36.101.29.1 at tcp: The configuration from log
'lfs-client' failed (-2). This may be the result of communication
errors between this node and the MGS, a bad configuration, or other
errors. See the syslog for more information.
LustreError: 6934:0:(llite_lib.c:1021:ll_fill_super()) Unable to process log: -2
LustreError: 6934:0:(mdc_request.c:1273:mdc_precleanup()) client
import never connected
LustreError: 7045:0:(connection.c:142:ptlrpc_put_connection()) NULL connection
LustreError: 6934:0:(obd_config.c:392:class_cleanup()) Device 41 not setup
Lustre: client 000001076591e400 umount complete
LustreError: 6934:0:(obd_mount.c:1924:lustre_fill_super()) Unable to mount  (-2)

Did I do something wrong?  The "man" page says to only do the
"--writconf" on the MDT node... I did it on the OSS as instructed.

Thanks,

Chris



More information about the lustre-discuss mailing list