[Lustre-discuss] Problem replacing an OST in 1.6.7

Nirmal Seenu nirmal at fnal.gov
Tue Mar 3 15:15:52 PST 2009


I am having trouble replacing OST0000 with a new disk and would 
appreciate any help with fixing this problem.

All the data for OST0000 got moved out of this OST before 
decommissioning and the "lfs find" against this OST returned 0 files. I 
was able to bring up the new OST in the earlier version 1.6.6 and 
everything was working as expected in the version 1.6.6.

I just update lustre version from 1.6.6 to 1.6.7 on the servers by using 
the patched kernel. At this point this OST is automatically marked as 
inactive in 1.6.7.

Note: There was no quota enabled in the older version and I was trying 
to enable quota on the newer version.

I tried the following without any success:

mkfs.lustre --fsname=lqcdproj --ost --mgsnode=iblustre1 at tcp1 
--mkfsoptions="-m 0" --index=0000 --reformat /dev/md2

tunefs.lustre --erase-params --ost --mgsnode=iblustre1 at tcp1 --param 
ost.quota_type=ug --writeconf /dev/md2

mount -t lustre /dev/md2 /mnt/ost0

I received these error messages when I tried to mount it for the first time:

Mar  3 16:19:53 lustre1 kernel: Lustre: MGS: Regenerating 
lqcdproj-OST0000 log by user request.
Mar  3 16:19:53 lustre1 kernel: Lustre: Skipped 1 previous similar message
Mar  3 16:19:53 lustre1 kernel: Lustre: Setting parameter 
lqcdproj-OST0000.ost.quota_type in log lqcdproj-OST0000
Mar  3 16:19:53 lustre1 kernel: Lustre: Skipped 2 previous similar messages
Mar  3 16:19:53 lustre1 kernel: Lustre: Filtering OBD driver; 
http://www.lustre.org/
Mar  3 16:19:53 lustre1 kernel: Lustre: lqcdproj-OST0000: new disk, 
initializing
Mar  3 16:19:53 lustre1 kernel: Lustre: OST lqcdproj-OST0000 now serving 
dev (lqcdproj-OST0000/a968f0cc-a66b-bbf7-458f-9b8759c60ef5) with 
recovery enabled
Mar  3 16:19:53 lustre1 kernel: Lustre: lqcdproj-OST0000.ost: set 
parameter quota_type=ug
Mar  3 16:19:53 lustre1 kernel: Lustre: Server lqcdproj-OST0000 on 
device /dev/md2 has started
Mar  3 16:19:56 lustre1 kernel: Lustre: lqcdproj-OST0000: received MDS 
connection from 0 at lo
Mar  3 16:19:56 lustre1 kernel: Lustre: MDS lqcdproj-MDT0000: 
lqcdproj-OST0000_UUID now active, resetting orphans
Mar  3 16:19:56 lustre1 kernel: Lustre: Skipped 2 previous similar messages
Mar  3 16:19:58 lustre1 kernel: LustreError: 
6359:0:(filter.c:3138:filter_precreate()) create failed rc = -28
Mar  3 16:19:58 lustre1 kernel: LustreError: 
6631:0:(lov_obd.c:1048:lov_clear_orphans()) error in orphan recovery on 
OST idx 0/13: rc = -28
Mar  3 16:19:58 lustre1 kernel: LustreError: 
6631:0:(mds_lov.c:951:__mds_lov_synchronize()) lqcdproj-OST0000_UUID 
failed at mds_lov_clear_orphans: -28
Mar  3 16:19:58 lustre1 kernel: LustreError: 
6631:0:(mds_lov.c:960:__mds_lov_synchronize()) lqcdproj-OST0000_UUID 
sync failed -28, deactivating


On the second attempt, I did a erase-params and writeconf on the MGS, 
MDT and all the OST partitions and still got the following error:

Mar  3 16:23:44 lustre1 kernel: Lustre: MGS: Regenerating 
lqcdproj-OST0000 log by user request.
Mar  3 16:23:44 lustre1 kernel: LustreError: 
6909:0:(llog_lvfs.c:577:llog_filp_open()) logfile creation 
CONFIGS/lqcdproj-OST0000T: -28
Mar  3 16:23:44 lustre1 kernel: LustreError: 
6909:0:(mgc_request.c:1080:mgc_copy_llog()) Failed to copy remote log 
lqcdproj-OST0000 (-28)
Mar  3 16:23:44 lustre1 kernel: Lustre: OST lqcdproj-OST0000 now serving 
dev (lqcdproj-OST0000/a968f0cc-a66b-bbf7-458f-9b8759c60ef5) with 
recovery enabled
Mar  3 16:23:44 lustre1 kernel: Lustre: lqcdproj-OST0000.ost: set 
parameter quota_type=ug
Mar  3 16:23:44 lustre1 kernel: Lustre: Skipped 1 previous similar message
Mar  3 16:23:48 lustre1 kernel: Lustre: 
6011:0:(quota_master.c:1642:mds_quota_recovery()) Not all osts are 
active, abort quota recovery
Mar  3 16:23:48 lustre1 kernel: Lustre: lqcdproj-OST0000: received MDS 
connection from 0 at lo
Mar  3 16:23:48 lustre1 kernel: Lustre: MDS lqcdproj-MDT0000: 
lqcdproj-OST0000_UUID now active, resetting orphans
Mar  3 16:23:48 lustre1 kernel: LustreError: 
6915:0:(filter.c:3138:filter_precreate()) create failed rc = -28
Mar  3 16:23:48 lustre1 kernel: LustreError: 
7184:0:(lov_obd.c:1048:lov_clear_orphans()) error in orphan recovery on 
OST idx 0/2: rc = -28
Mar  3 16:23:48 lustre1 kernel: LustreError: 
7184:0:(mds_lov.c:951:__mds_lov_synchronize()) lqcdproj-OST0000_UUID 
failed at mds_lov_clear_orphans: -28
Mar  3 16:23:48 lustre1 kernel: LustreError: 
7184:0:(mds_lov.c:960:__mds_lov_synchronize()) lqcdproj-OST0000_UUID 
sync failed -28, deactivating

Thanks
Nirmal



More information about the lustre-discuss mailing list