[Lustre-discuss] Problems with lfs find

Andreas Dilger andreas.dilger at oracle.com
Wed Dec 1 00:43:15 PST 2010


On 2010-11-30, at 15:46, Bob Ball wrote:
> Thanks.  Can you tell me how to do the mapping back to the MDS inode?  For example, is 1162976 in the list below the MDS inode?  May as well look.

You can use the "ll_decode_filter_fid" tool on the OST object files (e.g. the "1162976" file below) and it will print out the MDS inode number and generation.

> On 11/30/2010 4:17 PM, Andreas Dilger wrote:
>> On 2010-11-30, at 11:17, Bob Ball wrote:
>>> [root at umdist03 d0]# ls -l
>>> total 182976
>>> -rw-rw-rw- 1 daits users 45002956 Jul  5 20:52 1162976
>>> -rw-rw-rw- 1 daits users 44569036 Jul  7 02:53 1200608
>>> -rw-rw-rw- 1 daits users 49108913 Jun 28 04:43 1218976
>>> -rw-rw-rw- 1 daits users 48658429 Jul 16 13:29 1254176
>>> -rwSrwSrw- 1 root  root         0 Sep  2 15:11 128
>>> -rwSrwSrw- 1 root  root         0 Sep  2 15:11 9152
>>> -rwSrwSrw- 1 root  root         0 Sep  2 15:11 9216
>>> -rwSrwSrw- 1 root  root         0 Sep  2 15:11 9248
>>> 
>>> Some time back we had an MDT issue, and upon running e2fsck, saw a LOT
>>> of corrupted entries that were just deleted.  I suspect that these may
>>> have been entries pointing to these files?
>> Likely, yes.
>> 
>>> "lfs find" comes up empty handed for this OST, indeed, there are 6 OST
>>> here, each with about 10GB worth of files of this kind.  Are those 60GB
>>> just lost?  Short of pawing through these, by hand, to see what we can
>>> make of the content, is there a snowball's chance in Hades of identifying
>>> these files?
>> They can be mapped back to an MDS inode number, and the user/group information is intact, but that doesn't help if the MDS inodes were deleted by e2fsck since there will not be any file name available.
>> 
>>> Can I simply copy them out of this "ldiskfs" mount of the file system,
>>> back into some recovery directory in the real file system, so that users
>>> can pick through them?
>> Yes, just rsync the non-zero-length files from the ldiskfs-mounted OST filesystem into a new "lost+found" directory created in the lustre mountpoint on a client.  If you "chmod 1775 /path/to/lustre/lost+found" the owners of the file will be able to read/delete their files, but others will not (like /tmp).
>> 
>>> After they are moved, the file system will be reformatted and returned to use.
>> The whole Lustre filesystem, or the OST?  If you are replacing the OST, then you should still do a backup of last_rcvd, CONFIGS/, and O/0/LAST_ID from the OST, and then restore them to the after the OST is reformatted.  This process was very recently discussed on this list.
>> 
>>> On 11/30/2010 8:53 AM, Bob Ball wrote:
>>>> OK, thanks.  Scary, to see errors out of lfs find.
>>>> 
>>>> bob
>>>> 
>>>> On 11/30/2010 1:47 AM, Andreas Dilger wrote:
>>>>> On 2010-11-29, at 20:18, Bob Ball wrote:
>>>>>> I have an odd problem.  I am trying to empty all files from a set of OST
>>>>>> as indicated below, by making a list via lfs find and then sending that
>>>>>> list to lfs_migrate.  However, I have just gotten this message back from
>>>>>> the lfs find:
>>>>>> 
>>>>>> llapi_semantic_traverse: Failed to open
>>>>>> '/lustre/umt3/data13/daits/p15.6.3.10/prod/W1J_munu216465_simul': No
>>>>>> such file or directory (2)
>>>>>> error: find failed for umt3-OST0021.
>>>>> This may mean that the file was deleted while "lfs find" was running.
>>>>> 
>>>>>> On the OSS, I see this but not much else:
>>>>>> LustreError: 5226:0:(ldlm_resource.c:861:ldlm_resource_add()) lvbo_init
>>>>>> failed for resource 9101: rc -2
>>>>>> 
>>>>>> Can someone give me an idea of what is wrong  here?  And what can be
>>>>>> done about it, if anything?
>>>>> This might mean that the file was deleted at the same time the MDS crashed, and the objects were removed but the MDS file was not.  It is possible to just delete this file using the "unlink" command - it does not contain any data in any case.
>>>>> 
>>>>> Cheers, Andreas
>>>>> --
>>>>> Andreas Dilger
>>>>> Lustre Technical Lead
>>>>> Oracle Corporation Canada Inc.
>>>>> 
>>>>> 
>>>>> 
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>> 
>>>> 
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> 
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Lustre Technical Lead
>> Oracle Corporation Canada Inc.
>> 
>> 
>> 


Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.




More information about the lustre-discuss mailing list