[Lustre-discuss] questions about an OST content

Bob Ball ball at umich.edu
Wed Nov 10 10:01:08 PST 2010


Well, we ran 2 days, migrating files off OST, then this morning, the MDT 
crashed.  Could not get all clients reconnected before seeing another 
kernel panic on the mdt.  did an e2fsck of the mdt db and tried again.  
crashed again, but this time the logged message is:

2010-11-10T12:40:26-05:00 lmd01.local kernel: [12307.325340] Lustre: 
6243:0:(mds_lov.c:330:mds_lov_update_objids()) Unexpected gap in objids
2010-11-10T12:40:27-05:00 lmd01.local kernel: [12308.347087] Lustre: 
6243:0:(mds_lov.c:330:mds_lov_update_objids()) Unexpected gap in objids

I've seen this message elsewhere, but can't seem to find anything on it 
now, or what to do about it.

help?

bob

On 11/8/2010 4:27 PM, Bob Ball wrote:
> Yes, you are correct.  That was the key here, did not put that file back
> in place.  Back up and (so far) operating cleanly.
>
> Thanks,
> bob
>
> On 11/8/2010 3:04 PM, Andreas Dilger wrote:
>> On 2010-11-08, at 11:39, Bob Ball wrote:
>>> Don't know if I sent to the whole list.  One of those days.
>>>
>>> remade the raid device, remade the lustre fs on it, but the disks won't mount.  Error is below.  How do I overcome this?
>>>
>>> mounting device /dev/sdc at /mnt/ost12, flags=0 options=device=/dev/sdc
>>> mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address already in use retries left: 0
>>> mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address already in use
>>> The target service's index is already in use. (/dev/sdc)
>> Looks like you didn't copy the old "CONFIGS/mountdata" file over the new one.  You can also use "--writeconf" (described in the manual and several times on the list) to have the MGS re-generate the configuration, which should fix this as well.
>>
>>> On 11/8/2010 5:01 AM, Andreas Dilger wrote:
>>>> On 2010-11-07, at 12:32, Bob Ball<ball at umich.edu>   wrote:
>>>>> Tomorrow, we will redo all 8 OST on the first file server we are redoing.  I am very nervous about this, as a lot is riding on us doing this correctly.  For example, on a client now, if I umount one of the ost, without first taking some (unknown to me) action on the MDT, then the client will hang on the "df" command.
>>>>>
>>>>> So, while we are doing the reformat, is there any way to avoid this "hang" situation?
>>>> If you issue "lctl --device %{OSC UUID} deactivate" on the MDS and clients then any operations on those OSTs will immediately fail with an IO error. If you are migrating I objects from those OSTs, I would have imagined you already did this on the MDS or new objects would have continued to be allocated there
>>>>
>>>>> Is the --index=XX argument to mkfs.lustre hex, or decimal?  Seems from your comment below that this must be hex?
>>>> Decimal, though it may also accept hex (I can't check right now).
>>>>
>>>>> Finally, does supplying the --index even matter if we restore the files below that you mention?  That seems to be what you are saying.
>>>> Well, you still need to set the filesystem label. This could be done with tune2fs, but you may as well specify the right index from the beginning.
>>>>
>>>>> On 11/6/2010 11:09 AM, Andreas Dilger wrote:
>>>>>> On 2010-11-06, at 8:24, Bob Ball<ball at umich.edu>    wrote:
>>>>>>> I am emptying a set of OST so that I can reformat the underlying RAID-6
>>>>>>> more efficiently.  Two questions:
>>>>>>> 1. Is there a quick way to tell if the OST is really empty?  lfs_find
>>>>>>> takes many hours to run.
>>>>>> If you mount the OST as type ldiskfs and look in the O/0/d* directories (capital-O, zero) there should be a few hundred zero-length objects owned by root.
>>>>>>
>>>>>>> 2. When I reformat, I want it to retain the same ID so as to not make
>>>>>>> "holes" in the list.  From the following information, am I correct to
>>>>>>> assume that the id is 24?  If not, how do I determine the correct ID to
>>>>>>> use when we re-create the file system?
>>>>>> If you still have the existing OST, the easiest way to do this is to save the files last_rcvd, O/0/LAST_ID, and CONFIGS/*, and copy them into the reformatted OST.
>>>>>>
>>>>>>> /dev/sdj              3.5T  3.1T  222G  94% /mnt/ost51
>>>>>>>    10 UP obdfilter umt3-OST0018 umt3-OST0018_UUID 547
>>>>>>> umt3-OST0018_UUID           3.4T        3.0T      221.1G  88%
>>>>>>> /lustre/umt3[OST:24]
>>>>>>>    20 IN osc umt3-OST0018-osc umt3-mdtlov_UUID 5
>>>>>> The OST index is indeed 24 (18 hex). As for /dev/sdj, it is hard to know from the above info. If you run "e2label /dev/sdj"  the filesystem label should match the OST name umt3-OST0018.
>>>>>>
>>>>>> Cheers, Andreas
>>>>>>
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Lustre Technical Lead
>> Oracle Corporation Canada Inc.
>>
>>
>>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>



More information about the lustre-discuss mailing list