[Lustre-discuss] sanity check
Andreas Dilger
andreas.dilger at oracle.com
Wed May 26 14:37:33 PDT 2010
On 2010-05-26, at 13:47, Mervini, Joseph A wrote:
> I migrated all the files off the target with lfs_migrate. I didn't realize that I would need to retain any of the ldiskfs data if everything was moved. (I must have misinterpreted your earlier comment.)
>
> So this is my current scenario:
>
> 1. All data from a failing OST has been migrated to other targets.
> 2. The original target was recreated via mdadm.
> 3. mkfs.lustre was run on the recreated target
> 4. tunefs.lustre was run on the recreated target to set the index to what it was before it was reformatted.
> 5. No other data from the original target has been retained.
>
> Question:
>
> Based on the above conditions, what do I need to do to get this OST back into the file system?
Lustre is fairly robust about handling situations like this (e.g. recreating the last_rcvd file, the object heirarchy O/0/d{0..31}, etc). The one item that it will need help with is to recreate the LAST_ID file on the OST. You can do this by hand by extracting the last-precreated object from the MDS, and writing the LAST_ID file on the OST:
# extract last allocated object for all OSTs
mds# debugfs -c -R "dump lov_objids /tmp/lo"
# cut out the last allocated object for this OST index
mds# dd if=/tmp/lo of=/tmp/LAST_ID bs=8 skip=${OST index NN} count=1
# verify value is the right one (LAST_ID = next_id - 1)
mds# lctl get_param osc.*OST00NN.prealloc_next_id # NN is OST index
mds# od -td8 /tmp/LAST_ID
# get OST filesystem ready for this value
ossN# mount -t ldiskfs /dev/{ostdev} /mnt/tmp
ossN# mkdir -p /mnt/tmp/O/0
mds# scp /tmp/LAST_ID ossN:/mnt/tmp/O/0/LAST_ID
This will avoid the OST trying to recreate thousands/millions of objects when the OST next reconnects.
This could probably be handled internally by the OST, by simply bumping the LAST_ID value in the case that it is currently < 2 and the MDS is requesting some large value.
> On May 26, 2010, at 1:29 PM, Andreas Dilger wrote:
>
>> On 2010-05-26, at 13:18, Mervini, Joseph A wrote:
>>> I have migrated all the files that were on a damaged OST and have recreated the software raid array and put a lustre file system on it.
>>>
>>> I am now at the point where I want to re-introduce it to the scratch file system as if it was never gone. I used:
>>>
>>> tunefs.lustre --index=27 /dev/md4 to get the right index for the file system (the information is below). I just want to make sure there is nothing else I need to do before I pull the trigger will mounting it. (The things that have me concerned are the differences in the flags, and less so the "OST first_time update.)
>>
>> The use of tunefs.lustre is not sufficient to make the new OST identical to the previous one. You should also copy the O/0/LAST_ID file, last_rcvd, and mountdata files over, at which point you don't need tunefs.lustre at all.
>>
>>> <pre rebuild>
>>>
>>> [root at oss-scratch obdfilter]# tunefs.lustre /dev/md4
>>> checking for existing Lustre data: found CONFIGS/mountdata
>>> Reading CONFIGS/mountdata
>>>
>>> Read previous values:
>>> Target: scratch1-OST001b
>>> Index: 27
>>> Lustre FS: scratch1
>>> Mount type: ldiskfs
>>> Flags: 0x2
>>> (OST )
>>> Persistent mount opts: errors=remount-ro,extents,mballoc
>>> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib
>>>
>>>
>>> Permanent disk data:
>>> Target: scratch1-OST001b
>>> Index: 27
>>> Lustre FS: scratch1
>>> Mount type: ldiskfs
>>> Flags: 0x2
>>> (OST )
>>> Persistent mount opts: errors=remount-ro,extents,mballoc
>>> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib
>>>
>>> exiting before disk write.
>>>
>>>
>>> <after reformat and tunefs>
>>>
>>> [root at oss-scratch obdfilter]# tunefs.lustre --dryrun /dev/md4
>>> checking for existing Lustre data: found CONFIGS/mountdata
>>> Reading CONFIGS/mountdata
>>>
>>> Read previous values:
>>> Target: scratch1-OST001b
>>> Index: 27
>>> Lustre FS: scratch1
>>> Mount type: ldiskfs
>>> Flags: 0x62
>>> (OST first_time update )
>>> Persistent mount opts: errors=remount-ro,extents,mballoc
>>> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib
>>>
>>>
>>> Permanent disk data:
>>> Target: scratch1-OST001b
>>> Index: 27
>>> Lustre FS: scratch1
>>> Mount type: ldiskfs
>>> Flags: 0x62
>>> (OST first_time update )
>>> Persistent mount opts: errors=remount-ro,extents,mballoc
>>> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib
>>>
>>> exiting before disk write.
>>>
>>>
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Lustre Technical Lead
>> Oracle Corporation Canada Inc.
>>
>>
>
>
Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.
More information about the lustre-discuss
mailing list