[Lustre-discuss] Problem replacing an OST in 1.6.7
Nirmal Seenu
nirmal at fnal.gov
Tue Mar 3 15:15:52 PST 2009
I am having trouble replacing OST0000 with a new disk and would
appreciate any help with fixing this problem.
All the data for OST0000 got moved out of this OST before
decommissioning and the "lfs find" against this OST returned 0 files. I
was able to bring up the new OST in the earlier version 1.6.6 and
everything was working as expected in the version 1.6.6.
I just update lustre version from 1.6.6 to 1.6.7 on the servers by using
the patched kernel. At this point this OST is automatically marked as
inactive in 1.6.7.
Note: There was no quota enabled in the older version and I was trying
to enable quota on the newer version.
I tried the following without any success:
mkfs.lustre --fsname=lqcdproj --ost --mgsnode=iblustre1 at tcp1
--mkfsoptions="-m 0" --index=0000 --reformat /dev/md2
tunefs.lustre --erase-params --ost --mgsnode=iblustre1 at tcp1 --param
ost.quota_type=ug --writeconf /dev/md2
mount -t lustre /dev/md2 /mnt/ost0
I received these error messages when I tried to mount it for the first time:
Mar 3 16:19:53 lustre1 kernel: Lustre: MGS: Regenerating
lqcdproj-OST0000 log by user request.
Mar 3 16:19:53 lustre1 kernel: Lustre: Skipped 1 previous similar message
Mar 3 16:19:53 lustre1 kernel: Lustre: Setting parameter
lqcdproj-OST0000.ost.quota_type in log lqcdproj-OST0000
Mar 3 16:19:53 lustre1 kernel: Lustre: Skipped 2 previous similar messages
Mar 3 16:19:53 lustre1 kernel: Lustre: Filtering OBD driver;
http://www.lustre.org/
Mar 3 16:19:53 lustre1 kernel: Lustre: lqcdproj-OST0000: new disk,
initializing
Mar 3 16:19:53 lustre1 kernel: Lustre: OST lqcdproj-OST0000 now serving
dev (lqcdproj-OST0000/a968f0cc-a66b-bbf7-458f-9b8759c60ef5) with
recovery enabled
Mar 3 16:19:53 lustre1 kernel: Lustre: lqcdproj-OST0000.ost: set
parameter quota_type=ug
Mar 3 16:19:53 lustre1 kernel: Lustre: Server lqcdproj-OST0000 on
device /dev/md2 has started
Mar 3 16:19:56 lustre1 kernel: Lustre: lqcdproj-OST0000: received MDS
connection from 0 at lo
Mar 3 16:19:56 lustre1 kernel: Lustre: MDS lqcdproj-MDT0000:
lqcdproj-OST0000_UUID now active, resetting orphans
Mar 3 16:19:56 lustre1 kernel: Lustre: Skipped 2 previous similar messages
Mar 3 16:19:58 lustre1 kernel: LustreError:
6359:0:(filter.c:3138:filter_precreate()) create failed rc = -28
Mar 3 16:19:58 lustre1 kernel: LustreError:
6631:0:(lov_obd.c:1048:lov_clear_orphans()) error in orphan recovery on
OST idx 0/13: rc = -28
Mar 3 16:19:58 lustre1 kernel: LustreError:
6631:0:(mds_lov.c:951:__mds_lov_synchronize()) lqcdproj-OST0000_UUID
failed at mds_lov_clear_orphans: -28
Mar 3 16:19:58 lustre1 kernel: LustreError:
6631:0:(mds_lov.c:960:__mds_lov_synchronize()) lqcdproj-OST0000_UUID
sync failed -28, deactivating
On the second attempt, I did a erase-params and writeconf on the MGS,
MDT and all the OST partitions and still got the following error:
Mar 3 16:23:44 lustre1 kernel: Lustre: MGS: Regenerating
lqcdproj-OST0000 log by user request.
Mar 3 16:23:44 lustre1 kernel: LustreError:
6909:0:(llog_lvfs.c:577:llog_filp_open()) logfile creation
CONFIGS/lqcdproj-OST0000T: -28
Mar 3 16:23:44 lustre1 kernel: LustreError:
6909:0:(mgc_request.c:1080:mgc_copy_llog()) Failed to copy remote log
lqcdproj-OST0000 (-28)
Mar 3 16:23:44 lustre1 kernel: Lustre: OST lqcdproj-OST0000 now serving
dev (lqcdproj-OST0000/a968f0cc-a66b-bbf7-458f-9b8759c60ef5) with
recovery enabled
Mar 3 16:23:44 lustre1 kernel: Lustre: lqcdproj-OST0000.ost: set
parameter quota_type=ug
Mar 3 16:23:44 lustre1 kernel: Lustre: Skipped 1 previous similar message
Mar 3 16:23:48 lustre1 kernel: Lustre:
6011:0:(quota_master.c:1642:mds_quota_recovery()) Not all osts are
active, abort quota recovery
Mar 3 16:23:48 lustre1 kernel: Lustre: lqcdproj-OST0000: received MDS
connection from 0 at lo
Mar 3 16:23:48 lustre1 kernel: Lustre: MDS lqcdproj-MDT0000:
lqcdproj-OST0000_UUID now active, resetting orphans
Mar 3 16:23:48 lustre1 kernel: LustreError:
6915:0:(filter.c:3138:filter_precreate()) create failed rc = -28
Mar 3 16:23:48 lustre1 kernel: LustreError:
7184:0:(lov_obd.c:1048:lov_clear_orphans()) error in orphan recovery on
OST idx 0/2: rc = -28
Mar 3 16:23:48 lustre1 kernel: LustreError:
7184:0:(mds_lov.c:951:__mds_lov_synchronize()) lqcdproj-OST0000_UUID
failed at mds_lov_clear_orphans: -28
Mar 3 16:23:48 lustre1 kernel: LustreError:
7184:0:(mds_lov.c:960:__mds_lov_synchronize()) lqcdproj-OST0000_UUID
sync failed -28, deactivating
Thanks
Nirmal
More information about the lustre-discuss
mailing list