[Lustre-discuss] Added Dual-homed OSS; ethernet clients confused
Chris Worley
worleys at gmail.com
Tue Apr 22 17:08:47 PDT 2008
The error specifically complains about the first OST/disk on the new
OSS, OST0026. It's tunefs.lustre output was:
# tunefs.lustre --writeconf --ost
--mgsnode="36.102.29.1 at o2ib0,36.101.29.1 at tcp0" --fsname=lfs --param
sys.timeout=40 --param lov.stripesize=2M /dev/sdb
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
Read previous values:
Target: lfs-OST0026
Index: 38
Lustre FS: lfs
Mount type: ldiskfs
Flags: 0x2
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=36.102.29.1 at o2ib,36.101.29.1 at tcp sys.timeout=40
lov.stripesize=2M
Permanent disk data:
Target: lfs-OST0026
Index: 38
Lustre FS: lfs
Mount type: ldiskfs
Flags: 0x142
(OST update writeconf )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=36.102.29.1 at o2ib,36.101.29.1 at tcp sys.timeout=40
lov.stripesize=2M mgsnode=36.102.29.1 at o2ib,36.101.29.1 at tcp
sys.timeout=40 lov.stripesize=2M
Writing CONFIGS/mountdata
In comparing with the first OST of an OSS that is (has been) working
(doing a dryrun tunefs), I see no differences:
# tunefs.lustre --dryrun --writeconf /dev/sdb
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
Read previous values:
Target: lfs-OST0006
Index: 6
Lustre FS: lfs
Mount type: ldiskfs
Flags: 0x2
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=36.102.29.1 at o2ib,36.101.29.1 at tcp sys.timeout=40
lov.stripesize=2M
Permanent disk data:
Target: lfs-OST0006
Index: 6
Lustre FS: lfs
Mount type: ldiskfs
Flags: 0x102
(OST writeconf )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=36.102.29.1 at o2ib,36.101.29.1 at tcp sys.timeout=40
lov.stripesize=2M
exiting before disk write.
Any clues there?
Thanks,
Chris
On Tue, Apr 22, 2008 at 5:35 PM, Chris Worley <worleys at gmail.com> wrote:
> On Tue, Apr 22, 2008 at 5:01 PM, Cliff White <Cliff.White at sun.com> wrote:
> > You don't need to reformat or rebuild. The new OST registers with the
> > MGS on first startup, and since it didn't know about the TCP address is
> > only registered as IB. You need to regenerate the config, which can be
> > done with 'tunefs.lustre --writeconf' on the OSS providing the new OST.
>
> I unmounted everything lustre from clients and servers. I didn't
> unload any modules.
>
> On the OSS in question, for each OST, I did:
>
> # tunefs.lustre --writeconf --ost
>
> --mgsnode="36.102.29.1 at o2ib0,36.101.29.1 at tcp0" --fsname=lfs --param
> sys.timeout=40 --param lov.stripesize=2M /dev/sdl
>
> checking for existing Lustre data: found CONFIGS/mountdata
> Reading CONFIGS/mountdata
>
> Read previous values:
> Target: lfs-OST0030
> Index: 48
> Lustre FS: lfs
> Mount type: ldiskfs
> Flags: 0x2
> (OST )
> Persistent mount opts: errors=remount-ro,extents,mballoc
> Parameters: mgsnode=36.102.29.1 at o2ib,36.101.29.1 at tcp sys.timeout=40
> lov.stripesize=2M
>
>
> Permanent disk data:
> Target: lfs-OST0030
> Index: 48
> Lustre FS: lfs
> Mount type: ldiskfs
> Flags: 0x142
> (OST update writeconf )
> Persistent mount opts: errors=remount-ro,extents,mballoc
> Parameters: mgsnode=36.102.29.1 at o2ib,36.101.29.1 at tcp sys.timeout=40
> lov.stripesize=2M mgsnode=36.102.29.1 at o2ib,36.101.29.1 at tcp
> sys.timeout=40 lov.stripesize=2M
>
> I remounted all the OST's and MDT, then tried the Ethernet-only client
> mount. Still, the same error:
>
> # mount -t lustre 36.101.29.1 at tcp:/lfs /lfs
> mount.lustre: mount 36.101.29.1 at tcp:/lfs at /lfs failed: No such file
> or directory
> Is the MGS specification correct?
> Is the filesystem name correct?
> If upgrading, is the copied client log valid? (see upgrade docs)
> # dmesg
> Lustre: OBD class driver, info at clusterfs.com
> Lustre Version: 1.6.4.2
> Build Version:
> 1.6.4.2-19691231190000-PRISTINE-.usr.src.linux-2.6.9-67.0.4.EL-Lustre-1.6.4.2
> Lustre: Added LNI 36.101.255.10 at tcp [8/256]
> Lustre: Accept secure, port 988
> Lustre: Lustre Client File System; info at clusterfs.com
> Lustre: Binding irq 177 to CPU 0 with cmd: echo 1 > /proc/irq/177/smp_affinity
> Lustre: lfs-clilov-000001076591e400.lov: set parameter stripesize=4194304
> LustreError: 6934:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
>
> for 36.102.29.4 at o2ib
> LustreError: 6934:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
>
> find peer 36.102.29.4 at o2ib!
> LustreError: 6934:0:(ldlm_lib.c:312:client_obd_setup()) can't add
> initial connection
> LustreError: 7045:0:(connection.c:142:ptlrpc_put_connection()) NULL connection
> LustreError: 6934:0:(obd_config.c:325:class_setup()) setup
> lfs-OST0026-osc-000001076591e400 failed (-2)
> LustreError: 6934:0:(obd_config.c:1062:class_config_llog_handler())
>
> Err -2 on cfg command:
> Lustre: cmd=cf003 0:lfs-OST0026-osc 1:lfs-OST0026_UUID 2:36.102.29.4 at o2ib
> LustreError: 15c-8: MGC36.101.29.1 at tcp: The configuration from log
> 'lfs-client' failed (-2). This may be the result of communication
> errors between this node and the MGS, a bad configuration, or other
> errors. See the syslog for more information.
> LustreError: 6934:0:(llite_lib.c:1021:ll_fill_super()) Unable to process log: -2
> LustreError: 6934:0:(mdc_request.c:1273:mdc_precleanup()) client
> import never connected
> LustreError: 7045:0:(connection.c:142:ptlrpc_put_connection()) NULL connection
> LustreError: 6934:0:(obd_config.c:392:class_cleanup()) Device 41 not setup
> Lustre: client 000001076591e400 umount complete
> LustreError: 6934:0:(obd_mount.c:1924:lustre_fill_super()) Unable to mount (-2)
>
> Did I do something wrong? The "man" page says to only do the
> "--writconf" on the MDT node... I did it on the OSS as instructed.
>
> Thanks,
>
> Chris
>
More information about the lustre-discuss
mailing list