[Lustre-discuss] problem moving mdt to a new node

Ron Rechenmacher ron.rex at gmail.com
Tue Oct 7 09:52:26 PDT 2008


Here are the notes from my attempt to move MDT from pool4 to lustre3. Any
ideas on why the transfer of MDT didn't succeed?
And the reason why all the OSTs were marked inactive?

Thanks,
Ron

The following contains the commands we executed and the associated log
messages.

Note:
/dev/sda4 on pool4 is a Hardware RAID 1 device

/dev/mapper/lustrevol-lustrelv on lustre3 is a Volume Group on a LVM2
Physical Volume which is on a software RAID device /dev/md3

[root at lustre3 ~]# pvdisplay
  --- Physical volume ---
  PV Name               /dev/md3
  VG Name               lustrevol
  PV Size               592.86 GB / not usable 1.00 MB
  Allocatable           yes
  PE Size (KByte)       4096
  Total PE              151772
  Free PE               113372
  Allocated PE          38400
  PV UUID               KRY0HY-BhjD-l8qR-14Qw-cQ1T-NNHT-LXT3Bs

[root at lustre3 ~]# vgdisplay
  --- Volume group ---
  VG Name               lustrevol
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  2
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               592.86 GB
  PE Size               4.00 MB
  Total PE              151772
  Alloc PE / Size       38400 / 150.00 GB
  Free  PE / Size       113372 / 442.86 GB
  VG UUID               XFvbDk-Ukfg-fTcQ-XcNp-rX0f-hYTl-XbkGKs

[root at lqcd-pool4 ~]# mount -t ldiskfs /dev/sda4 /mnt/mdt
[root at lqcd-pool4 ~]# cd /mnt/mdt/
[root at lqcd-pool4 mdt]# getfattr -R -d -m '.*' -P . > /root/ea.bak
[root at lqcd-pool4 mdt]# /usr/bin/rcp /root/ea.bak lustre3:/root/ea.bak

[root at lustre3 ~]# mkfs.lustre --fsname=lustre --mdt --mgs --param
lov.stripecount=1 --mkfsoptions="-m 0" --reformat
/dev/mapper/lustrevol-lustrelv
[root at lustre3 ~]# mount -t ldiskfs /dev/mapper/lustrevol-lustrelv /mnt/mdt

[root at lqcd-pool4 mdt]# export RSYNC_RSH=/usr/bin/rsh
[root at lqcd-pool4 mdt]# rsync -aSvz --ignore-existing --ignore-times
/mnt/mdt/ lustre3:/mnt/mdt > /tmp/rsync.log 2>&1

[root at lustre3 ~]# cd /mnt/mdt
[root at lustre3 mdt]# setfattr --restore=/root/ea.bak

The following command was executed on all the 24 OSTs
      tunefs.lustre --erase-param --mgsnode=lustre3 --writeconf /dev/sde1

[root at lustre3 ~]# mount -t lustre /dev/mapper/lustrevol-lustrelv /mnt/mdt
mount.lustre: mount /dev/mapper/lustrevol-lustrelv at /mnt/mdt failed:
Address already in use
The target service's index is already in use.
(/dev/mapper/lustrevol-lustrelv)

Oct  6 16:39:28 lustre3 kernel: kjournald starting.  Commit interval 5
seconds
Oct  6 16:39:28 lustre3 kernel: LDISKFS FS on dm-0, internal journal
Oct  6 16:39:28 lustre3 kernel: LDISKFS-fs: mounted filesystem with ordered
data mode.
Oct  6 16:39:28 lustre3 kernel: kjournald starting.  Commit interval 5
seconds
Oct  6 16:39:28 lustre3 kernel: LDISKFS FS on dm-0, internal journal
Oct  6 16:39:28 lustre3 kernel: LDISKFS-fs: mounted filesystem with ordered
data mode.
Oct  6 16:39:28 lustre3 kernel: Lustre: MGS MGS started
Oct  6 16:39:28 lustre3 kernel: LustreError: 13e-c: MDT index must = 0
(until Clustered MetaData feature is ready.)
Oct  6 16:39:28 lustre3 kernel: LustreError: 140-5: Server lustre-MDTffff
requested index 0, but that index is already in use
Oct  6 16:39:28 lustre3 kernel: LustreError:
5026:0:(mgs_llog.c:1672:mgs_write_log_target()) Can't get index (-98)
Oct  6 16:39:28 lustre3 kernel: LustreError:
5026:0:(mgs_handler.c:431:mgs_handle_target_reg()) Failed to write
lustre-MDTffff log (-98)
Oct  6 16:39:29 lustre3 kernel: LustreError:
5026:0:(mgs_handler.c:625:mgs_handle()) MGS handle cmd=253 rc=-98
Oct  6 16:39:29 lustre3 kernel: LustreError:
5026:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error (-98)
req at ffff8102152b3450 x117/t0
o253->fb445385-7ef9-62f3-1e58-db8f8dc29917 at NET_0x9000000000000_UUID:0/0 lens
4672/4672 e 0 to 0 dl 1223329268 ref 1 fl Interpret:/0/0 rc 0/0
Oct  6 16:39:29 lustre3 kernel: LustreError: 11-0: an error occurred while
communicating with 0 at lo. The mgs_target_reg operation failed with -98
Oct  6 16:39:29 lustre3 kernel: LustreError:
4964:0:(obd_mount.c:1062:server_start_targets()) Required registration
failed for lustre-MDTffff: -98
Oct  6 16:39:29 lustre3 kernel: LustreError:
4964:0:(obd_mount.c:1597:server_fill_super()) Unable to start targets: -98
Oct  6 16:39:29 lustre3 kernel: LustreError:
4964:0:(obd_mount.c:1382:server_put_super()) no obd lustre-MDTffff
Oct  6 16:39:29 lustre3 kernel: LustreError:
4964:0:(obd_mount.c:119:server_deregister_mount()) lustre-MDTffff not
registered
Oct  6 16:39:29 lustre3 kernel: Lustre: MGS has stopped.
Oct  6 16:39:29 lustre3 kernel: Lustre: server umount lustre-MDTffff
complete
Oct  6 16:39:29 lustre3 kernel: LustreError:
4964:0:(obd_mount.c:1951:lustre_fill_super()) Unable to mount  (-98)

[root at lustre3 ~]# tunefs.lustre --erase-params --mgs --mdt --writeconf
/dev/lustrevol/lustrelv
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

   Read previous values:
Target:     lustre-MDTffff
Index:      unassigned
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x75
              (MDT MGS needs_index first_time update )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: lov.stripecount=1 mdt.group_upcall=/usr/sbin/l_getgroups


   Permanent disk data:
Target:     lustre-MDTffff
Index:      unassigned
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x175
              (MDT MGS needs_index first_time update writeconf )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters:

Writing CONFIGS/mountdata

[root at lustre3 ~]# mount -t lustre /dev/mapper/lustrevol-lustrelv /mnt/mdt

[root at lustre3 ~]# mount -v -t lustre /dev/sde1 /mnt/sata1-1-3
arg[0] = /sbin/mount.lustre
arg[1] = -v
arg[2] = -o
arg[3] = rw,noauto,_netdev
arg[4] = /dev/sde1
arg[5] = /mnt/sata1-1-3
source = /dev/sde1 (/dev/sde1), target = /mnt/sata1-1-3
options = rw,noauto,_netdev
mounting device /dev/sde1 at /mnt/sata1-1-3, flags=0
options=device=/dev/sde1

Oct  6 16:40:57 lustre3 kernel: kjournald starting.  Commit interval 5
seconds
Oct  6 16:40:57 lustre3 kernel: LDISKFS FS on dm-0, internal journal
Oct  6 16:40:57 lustre3 kernel: LDISKFS-fs: mounted filesystem with ordered
data mode.
Oct  6 16:41:05 lustre3 kernel: kjournald starting.  Commit interval 5
seconds
Oct  6 16:41:05 lustre3 kernel: LDISKFS FS on dm-0, internal journal
Oct  6 16:41:05 lustre3 kernel: LDISKFS-fs: mounted filesystem with ordered
data mode.
Oct  6 16:41:05 lustre3 kernel: kjournald starting.  Commit interval 5
seconds
Oct  6 16:41:05 lustre3 kernel: LDISKFS FS on dm-0, internal journal
Oct  6 16:41:05 lustre3 kernel: LDISKFS-fs: mounted filesystem with ordered
data mode.
Oct  6 16:41:05 lustre3 kernel: Lustre: MGS MGS started
Oct  6 16:41:05 lustre3 kernel: Lustre: MGS: Logs for fs lustre were removed
by user request.  All servers must be restarted in order to regenerate the
logs.
Oct  6 16:41:05 lustre3 kernel: Lustre: Enabling user_xattr
Oct  6 16:41:06 lustre3 kernel: LustreError:
5145:0:(fsfilt-ldiskfs.c:1283:fsfilt_ldiskfs_read_record()) can't read
block: 0
Oct  6 16:41:06 lustre3 kernel: Lustre: MDT lustre-MDT0000 now serving dev
(lustre-MDT0000/9b2d9c21-aeec-b2d2-4d55-b8e6d8a37b4a) with recovery enabled
Oct  6 16:41:06 lustre3 kernel: Lustre: Server lustre-MDT0000 on device
/dev/mapper/lustrevol-lustrelv has started
Oct  6 16:41:55 lustre3 kernel: kjournald starting.  Commit interval 5
seconds
Oct  6 16:41:55 lustre3 kernel: LDISKFS FS on sde1, internal journal
Oct  6 16:41:55 lustre3 kernel: LDISKFS-fs: mounted filesystem with ordered
data mode.
Oct  6 16:41:55 lustre3 kernel: kjournald starting.  Commit interval 5
seconds
Oct  6 16:41:55 lustre3 kernel: LDISKFS FS on sde1, internal journal
Oct  6 16:41:55 lustre3 kernel: LDISKFS-fs: mounted filesystem with ordered
data mode.
Oct  6 16:41:55 lustre3 kernel: LDISKFS-fs: file extents enabled
Oct  6 16:41:55 lustre3 kernel: LDISKFS-fs: mballoc enabled
Oct  6 16:41:55 lustre3 kernel: Lustre: MGS: Regenerating lustre-OST0012 log
by user request.
Oct  6 16:41:55 lustre3 kernel: Lustre: OST lustre-OST0012 now serving dev
(lustre-OST0012/32f8d0ff-18d9-b05e-491b-477b8558b745) with recovery enabled
Oct  6 16:41:55 lustre3 kernel: Lustre: Server lustre-OST0012 on device
/dev/sde1 has started
Oct  6 16:42:00 lustre3 kernel: Lustre:
5536:0:(quota_master.c:1576:mds_quota_recovery()) Not all osts are active,
abort quota recovery
Oct  6 16:42:00 lustre3 kernel: LustreError:
5539:0:(llog_lvfs.c:597:llog_lvfs_create()) error looking up logfile
0x28c8020:0x3c23cd5e: rc -2
Oct  6 16:42:00 lustre3 kernel: LustreError:
5539:0:(osc_request.c:3586:osc_llog_init()) failed LLOG_MDS_OST_ORIG_CTXT
Oct  6 16:42:00 lustre3 kernel: LustreError:
5539:0:(osc_request.c:3597:osc_llog_init()) osc 'lustre-OST0012-osc' tgt
'lustre-MDT0000' cnt 1 catid ffffc20000a3b240 rc=-2
Oct  6 16:42:00 lustre3 kernel: LustreError:
5539:0:(osc_request.c:3599:osc_llog_init()) logid 0x28c8020:0x3c23cd5e
Oct  6 16:42:00 lustre3 kernel: LustreError:
5539:0:(lov_log.c:214:lov_llog_init()) error osc_llog_init idx 18 osc
'lustre-OST0012-osc' tgt 'lustre-MDT0000' (rc=-2)
Oct  6 16:42:00 lustre3 kernel: LustreError:
5539:0:(mds_log.c:207:mds_llog_init()) lov_llog_init err -2
Oct  6 16:42:00 lustre3 kernel: LustreError:
5539:0:(llog_obd.c:394:llog_cat_initialize()) rc: -2
Oct  6 16:42:00 lustre3 kernel: LustreError:
5539:0:(mds_lov.c:855:__mds_lov_synchronize()) lustre-OST0012_UUID failed at
update_mds: -2
Oct  6 16:42:00 lustre3 kernel: LustreError:
5539:0:(mds_lov.c:898:__mds_lov_synchronize()) lustre-OST0012_UUID sync
failed -2, deactivating


[root at lustre3 ~]# lctl dl
  0 UP mgs MGS MGS 11
  1 UP mgc MGC192.168.241.243 at tcp b2bcceae-de69-e1b3-d96f-2971bba2fdfc 5
  2 UP mdt MDS MDS_uuid 3
  3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4
  4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 3
  5 UP ost OSS OSS_uuid 3
  6 UP obdfilter lustre-OST0012 lustre-OST0012_UUID 5
  7 IN osc lustre-OST0012-osc lustre-mdtlov_UUID 5
  8 UP obdfilter lustre-OST0013 lustre-OST0013_UUID 5
  9 IN osc lustre-OST0013-osc lustre-mdtlov_UUID 5
 10 UP obdfilter lustre-OST0014 lustre-OST0014_UUID 5
 11 UP obdfilter lustre-OST0015 lustre-OST0015_UUID 5
 12 IN osc lustre-OST0014-osc lustre-mdtlov_UUID 5
 13 IN osc lustre-OST0015-osc lustre-mdtlov_UUID 5
 14 UP obdfilter lustre-OST0016 lustre-OST0016_UUID 5
 15 IN osc lustre-OST0016-osc lustre-mdtlov_UUID 5
 16 UP obdfilter lustre-OST0017 lustre-OST0017_UUID 5
 17 IN osc lustre-OST0017-osc lustre-mdtlov_UUID 5
 18 IN osc lustre-OST000c-osc lustre-mdtlov_UUID 5
 19 IN osc lustre-OST000d-osc lustre-mdtlov_UUID 5
 20 IN osc lustre-OST000e-osc lustre-mdtlov_UUID 5
 21 IN osc lustre-OST000f-osc lustre-mdtlov_UUID 5
 22 IN osc lustre-OST0010-osc lustre-mdtlov_UUID 5
 23 IN osc lustre-OST0011-osc lustre-mdtlov_UUID 5
 24 IN osc lustre-OST0000-osc lustre-mdtlov_UUID 5
 25 IN osc lustre-OST0001-osc lustre-mdtlov_UUID 5
 26 IN osc lustre-OST0002-osc lustre-mdtlov_UUID 5
 27 IN osc lustre-OST0003-osc lustre-mdtlov_UUID 5
 28 IN osc lustre-OST0004-osc lustre-mdtlov_UUID 5
 29 IN osc lustre-OST0005-osc lustre-mdtlov_UUID 5
 30 IN osc lustre-OST0006-osc lustre-mdtlov_UUID 5
 31 IN osc lustre-OST0007-osc lustre-mdtlov_UUID 5
 32 IN osc lustre-OST0008-osc lustre-mdtlov_UUID 5
 33 IN osc lustre-OST0009-osc lustre-mdtlov_UUID 5
 34 IN osc lustre-OST000a-osc lustre-mdtlov_UUID 5
 35 IN osc lustre-OST000b-osc lustre-mdtlov_UUID 5
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20081007/f6c624aa/attachment.htm>


More information about the lustre-discuss mailing list