[Lustre-discuss] Problems with lfs find

Bob Ball ball at umich.edu
Tue Nov 30 10:17:39 PST 2010


OK, well, that file was just not anywhere, and it was only 1.  But now 
that the OST is "completely empty", I find that it is not really empty.  
For example:

[root at umdist03 d0]# pwd
/mnt/ost/O/0/d0

[root at umdist03 d0]# ls -l
total 182976
-rw-rw-rw- 1 daits users 45002956 Jul  5 20:52 1162976
-rw-rw-rw- 1 daits users 44569036 Jul  7 02:53 1200608
-rw-rw-rw- 1 daits users 49108913 Jun 28 04:43 1218976
-rw-rw-rw- 1 daits users 48658429 Jul 16 13:29 1254176
-rwSrwSrw- 1 root  root         0 Sep  2 15:11 128
-rwSrwSrw- 1 root  root         0 Sep  2 15:11 9152
-rwSrwSrw- 1 root  root         0 Sep  2 15:11 9216
-rwSrwSrw- 1 root  root         0 Sep  2 15:11 9248

Some time back we had an MDT issue, and upon running e2fsck, saw a LOT 
of corrupted entries that were just deleted.  I suspect that these may 
have been entries pointing to these files?  "lfs find" comes up empty 
handed for this OST, indeed, there are 6 OST here, each with about 10GB 
worth of files of this kind.  Are those 60GB just lost?  Short of pawing 
through these, by hand, to see what we can make of the content, is there 
a snowball's chance in Hades of identifying these files?

Can I simply copy them out of this "ldiskfs" mount of the file system, 
back into some recovery directory in the real file system, so that users 
can pick through them?  After they are moved, the file system will be 
reformatted and returned to use.

bob

On 11/30/2010 8:53 AM, Bob Ball wrote:
> OK, thanks.  Scary, to see errors out of lfs find.
>
> bob
>
> On 11/30/2010 1:47 AM, Andreas Dilger wrote:
>> On 2010-11-29, at 20:18, Bob Ball wrote:
>>> I have an odd problem.  I am trying to empty all files from a set of OST
>>> as indicated below, by making a list via lfs find and then sending that
>>> list to lfs_migrate.  However, I have just gotten this message back from
>>> the lfs find:
>>>
>>> llapi_semantic_traverse: Failed to open
>>> '/lustre/umt3/data13/daits/p15.6.3.10/prod/W1J_munu216465_simul': No
>>> such file or directory (2)
>>> error: find failed for umt3-OST0021.
>> This may mean that the file was deleted while "lfs find" was running.
>>
>>> On the OSS, I see this but not much else:
>>> LustreError: 5226:0:(ldlm_resource.c:861:ldlm_resource_add()) lvbo_init
>>> failed for resource 9101: rc -2
>>>
>>> Can someone give me an idea of what is wrong  here?  And what can be
>>> done about it, if anything?
>> This might mean that the file was deleted at the same time the MDS crashed, and the objects were removed but the MDS file was not.  It is possible to just delete this file using the "unlink" command - it does not contain any data in any case.
>>
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Lustre Technical Lead
>> Oracle Corporation Canada Inc.
>>
>>
>>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>



More information about the lustre-discuss mailing list