[Lustre-discuss] sanity check

Wed May 26 14:37:33 PDT 2010

On 2010-05-26, at 13:47, Mervini, Joseph A wrote:
> I migrated all the files off the target with lfs_migrate. I didn't realize that I would need to retain any of the ldiskfs data if everything was moved. (I must have misinterpreted your earlier comment.)
> 
> So this is my current scenario:
> 
> 1. All data from a failing OST has been migrated to other targets.
> 2. The original target was recreated via mdadm.
> 3. mkfs.lustre was run on the recreated target
> 4. tunefs.lustre was run on the recreated target to set the index to what it was before it was reformatted.
> 5. No other data from the original target has been retained.
> 
> Question:
> 
> Based on the above conditions, what do I need to do to get this OST back into the file system?

Lustre is fairly robust about handling situations like this (e.g. recreating the last_rcvd file, the object heirarchy O/0/d{0..31}, etc).  The one item that it will need help with is to recreate the LAST_ID file on the OST.  You can do this by hand by extracting the last-precreated object from the MDS, and writing the LAST_ID file on the OST:

# extract last allocated object for all OSTs
mds# debugfs -c -R "dump lov_objids /tmp/lo"
# cut out the last allocated object for this OST index
mds# dd if=/tmp/lo of=/tmp/LAST_ID bs=8 skip=${OST index NN} count=1
# verify value is the right one (LAST_ID = next_id - 1)
mds# lctl get_param osc.*OST00NN.prealloc_next_id  # NN is OST index
mds# od -td8 /tmp/LAST_ID
# get OST filesystem ready for this value
ossN# mount -t ldiskfs /dev/{ostdev} /mnt/tmp
ossN# mkdir -p /mnt/tmp/O/0
mds# scp /tmp/LAST_ID ossN:/mnt/tmp/O/0/LAST_ID

This will avoid the OST trying to recreate thousands/millions of objects when the OST next reconnects.

This could probably be handled internally by the OST, by simply bumping the LAST_ID value in the case that it is currently < 2 and the MDS is requesting some large value.

> On May 26, 2010, at 1:29 PM, Andreas Dilger wrote:
> 
>> On 2010-05-26, at 13:18, Mervini, Joseph A wrote:
>>> I have migrated all the files that were on a damaged OST and have recreated the software raid array and put a lustre file system on it.
>>> 
>>> I am now at the point where I want to re-introduce it to the scratch file system as if it was never gone. I used:
>>> 
>>> tunefs.lustre --index=27 /dev/md4 to get the right index for the file system (the information is below). I just want to make sure there is nothing else I need to do before I pull the trigger will mounting it. (The things that have me concerned are the differences in the flags, and less so the "OST first_time update.)
>> 
>> The use of tunefs.lustre is not sufficient to make the new OST identical to the previous one.  You should also copy the O/0/LAST_ID file, last_rcvd, and mountdata files over, at which point you don't need tunefs.lustre at all.
>> 
>>> <pre rebuild>
>>> 
>>> [root at oss-scratch obdfilter]# tunefs.lustre /dev/md4
>>> checking for existing Lustre data: found CONFIGS/mountdata
>>> Reading CONFIGS/mountdata
>>> 
>>> Read previous values:
>>> Target:     scratch1-OST001b
>>> Index:      27
>>> Lustre FS:  scratch1
>>> Mount type: ldiskfs
>>> Flags:      0x2
>>>           (OST )
>>> Persistent mount opts: errors=remount-ro,extents,mballoc
>>> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib
>>> 
>>> 
>>> Permanent disk data:
>>> Target:     scratch1-OST001b
>>> Index:      27
>>> Lustre FS:  scratch1
>>> Mount type: ldiskfs
>>> Flags:      0x2
>>>           (OST )
>>> Persistent mount opts: errors=remount-ro,extents,mballoc
>>> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib
>>> 
>>> exiting before disk write.
>>> 
>>> 
>>> <after reformat and tunefs>
>>> 
>>> [root at oss-scratch obdfilter]# tunefs.lustre --dryrun /dev/md4
>>> checking for existing Lustre data: found CONFIGS/mountdata
>>> Reading CONFIGS/mountdata
>>> 
>>> Read previous values:
>>> Target:     scratch1-OST001b
>>> Index:      27
>>> Lustre FS:  scratch1
>>> Mount type: ldiskfs
>>> Flags:      0x62
>>>           (OST first_time update )
>>> Persistent mount opts: errors=remount-ro,extents,mballoc
>>> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib
>>> 
>>> 
>>> Permanent disk data:
>>> Target:     scratch1-OST001b
>>> Index:      27
>>> Lustre FS:  scratch1
>>> Mount type: ldiskfs
>>> Flags:      0x62
>>>           (OST first_time update )
>>> Persistent mount opts: errors=remount-ro,extents,mballoc
>>> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib
>>> 
>>> exiting before disk write.
>>> 
>>> 
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> 
>> 
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Lustre Technical Lead
>> Oracle Corporation Canada Inc.
>> 
>> 
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.