[Lustre-discuss] Added Dual-homed OSS; ethernet clients confused

Chris Worley worleys at gmail.com
Tue Apr 22 17:08:47 PDT 2008


The error specifically complains about the first OST/disk on the new
OSS, OST0026.  It's tunefs.lustre output was:

# tunefs.lustre --writeconf --ost
--mgsnode="36.102.29.1 at o2ib0,36.101.29.1 at tcp0" --fsname=lfs --param
sys.timeout=40 --param lov.stripesize=2M /dev/sdb
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

   Read previous values:
Target:     lfs-OST0026
Index:      38
Lustre FS:  lfs
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=36.102.29.1 at o2ib,36.101.29.1 at tcp sys.timeout=40
lov.stripesize=2M


   Permanent disk data:
Target:     lfs-OST0026
Index:      38
Lustre FS:  lfs
Mount type: ldiskfs
Flags:      0x142
              (OST update writeconf )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=36.102.29.1 at o2ib,36.101.29.1 at tcp sys.timeout=40
lov.stripesize=2M mgsnode=36.102.29.1 at o2ib,36.101.29.1 at tcp
sys.timeout=40 lov.stripesize=2M

Writing CONFIGS/mountdata



  In comparing with the first OST of an OSS that is (has been) working
(doing a dryrun tunefs), I see no differences:


# tunefs.lustre --dryrun --writeconf /dev/sdb
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

   Read previous values:
Target:     lfs-OST0006
Index:      6
Lustre FS:  lfs
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=36.102.29.1 at o2ib,36.101.29.1 at tcp sys.timeout=40
lov.stripesize=2M


   Permanent disk data:
Target:     lfs-OST0006
Index:      6
Lustre FS:  lfs
Mount type: ldiskfs
Flags:      0x102
              (OST writeconf )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=36.102.29.1 at o2ib,36.101.29.1 at tcp sys.timeout=40
lov.stripesize=2M

exiting before disk write.

Any clues there?

Thanks,

Chris
On Tue, Apr 22, 2008 at 5:35 PM, Chris Worley <worleys at gmail.com> wrote:
> On Tue, Apr 22, 2008 at 5:01 PM, Cliff White <Cliff.White at sun.com> wrote:
>  >  You don't need to reformat or rebuild. The new OST registers with the
>  >  MGS on first startup, and since it didn't know about the TCP address is
>  >  only registered as IB. You need to regenerate the config, which can be
>  >  done with 'tunefs.lustre --writeconf' on the OSS providing the new OST.
>
>  I unmounted everything lustre from clients and servers.  I didn't
>  unload any modules.
>
>  On the OSS in question, for each OST, I did:
>
>  # tunefs.lustre --writeconf --ost
>
> --mgsnode="36.102.29.1 at o2ib0,36.101.29.1 at tcp0" --fsname=lfs --param
>  sys.timeout=40 --param lov.stripesize=2M /dev/sdl
>
>  checking for existing Lustre data: found CONFIGS/mountdata
>  Reading CONFIGS/mountdata
>
>    Read previous values:
>  Target:     lfs-OST0030
>  Index:      48
>  Lustre FS:  lfs
>  Mount type: ldiskfs
>  Flags:      0x2
>               (OST )
>  Persistent mount opts: errors=remount-ro,extents,mballoc
>  Parameters: mgsnode=36.102.29.1 at o2ib,36.101.29.1 at tcp sys.timeout=40
>  lov.stripesize=2M
>
>
>    Permanent disk data:
>  Target:     lfs-OST0030
>  Index:      48
>  Lustre FS:  lfs
>  Mount type: ldiskfs
>  Flags:      0x142
>               (OST update writeconf )
>  Persistent mount opts: errors=remount-ro,extents,mballoc
>  Parameters: mgsnode=36.102.29.1 at o2ib,36.101.29.1 at tcp sys.timeout=40
>  lov.stripesize=2M mgsnode=36.102.29.1 at o2ib,36.101.29.1 at tcp
>  sys.timeout=40 lov.stripesize=2M
>
>  I remounted all the OST's and MDT, then tried the Ethernet-only client
>  mount.  Still, the same error:
>
>  # mount -t lustre 36.101.29.1 at tcp:/lfs /lfs
>  mount.lustre: mount 36.101.29.1 at tcp:/lfs at /lfs failed: No such file
>  or directory
>  Is the MGS specification correct?
>  Is the filesystem name correct?
>  If upgrading, is the copied client log valid? (see upgrade docs)
>  # dmesg
>  Lustre: OBD class driver, info at clusterfs.com
>         Lustre Version: 1.6.4.2
>         Build Version:
>  1.6.4.2-19691231190000-PRISTINE-.usr.src.linux-2.6.9-67.0.4.EL-Lustre-1.6.4.2
>  Lustre: Added LNI 36.101.255.10 at tcp [8/256]
>  Lustre: Accept secure, port 988
>  Lustre: Lustre Client File System; info at clusterfs.com
>  Lustre: Binding irq 177 to CPU 0 with cmd: echo 1 > /proc/irq/177/smp_affinity
>  Lustre: lfs-clilov-000001076591e400.lov: set parameter stripesize=4194304
>  LustreError: 6934:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
>
> for 36.102.29.4 at o2ib
>  LustreError: 6934:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
>
> find peer 36.102.29.4 at o2ib!
>  LustreError: 6934:0:(ldlm_lib.c:312:client_obd_setup()) can't add
>  initial connection
>  LustreError: 7045:0:(connection.c:142:ptlrpc_put_connection()) NULL connection
>  LustreError: 6934:0:(obd_config.c:325:class_setup()) setup
>  lfs-OST0026-osc-000001076591e400 failed (-2)
>  LustreError: 6934:0:(obd_config.c:1062:class_config_llog_handler())
>
> Err -2 on cfg command:
>  Lustre:    cmd=cf003 0:lfs-OST0026-osc  1:lfs-OST0026_UUID  2:36.102.29.4 at o2ib
>  LustreError: 15c-8: MGC36.101.29.1 at tcp: The configuration from log
>  'lfs-client' failed (-2). This may be the result of communication
>  errors between this node and the MGS, a bad configuration, or other
>  errors. See the syslog for more information.
>  LustreError: 6934:0:(llite_lib.c:1021:ll_fill_super()) Unable to process log: -2
>  LustreError: 6934:0:(mdc_request.c:1273:mdc_precleanup()) client
>  import never connected
>  LustreError: 7045:0:(connection.c:142:ptlrpc_put_connection()) NULL connection
>  LustreError: 6934:0:(obd_config.c:392:class_cleanup()) Device 41 not setup
>  Lustre: client 000001076591e400 umount complete
>  LustreError: 6934:0:(obd_mount.c:1924:lustre_fill_super()) Unable to mount  (-2)
>
>  Did I do something wrong?  The "man" page says to only do the
>  "--writconf" on the MDT node... I did it on the OSS as instructed.
>
>  Thanks,
>
>  Chris
>



More information about the lustre-discuss mailing list