[Lustre-discuss] Problems with lfs find
Bob Ball
ball at umich.edu
Tue Nov 30 10:17:39 PST 2010
OK, well, that file was just not anywhere, and it was only 1. But now
that the OST is "completely empty", I find that it is not really empty.
For example:
[root at umdist03 d0]# pwd
/mnt/ost/O/0/d0
[root at umdist03 d0]# ls -l
total 182976
-rw-rw-rw- 1 daits users 45002956 Jul 5 20:52 1162976
-rw-rw-rw- 1 daits users 44569036 Jul 7 02:53 1200608
-rw-rw-rw- 1 daits users 49108913 Jun 28 04:43 1218976
-rw-rw-rw- 1 daits users 48658429 Jul 16 13:29 1254176
-rwSrwSrw- 1 root root 0 Sep 2 15:11 128
-rwSrwSrw- 1 root root 0 Sep 2 15:11 9152
-rwSrwSrw- 1 root root 0 Sep 2 15:11 9216
-rwSrwSrw- 1 root root 0 Sep 2 15:11 9248
Some time back we had an MDT issue, and upon running e2fsck, saw a LOT
of corrupted entries that were just deleted. I suspect that these may
have been entries pointing to these files? "lfs find" comes up empty
handed for this OST, indeed, there are 6 OST here, each with about 10GB
worth of files of this kind. Are those 60GB just lost? Short of pawing
through these, by hand, to see what we can make of the content, is there
a snowball's chance in Hades of identifying these files?
Can I simply copy them out of this "ldiskfs" mount of the file system,
back into some recovery directory in the real file system, so that users
can pick through them? After they are moved, the file system will be
reformatted and returned to use.
bob
On 11/30/2010 8:53 AM, Bob Ball wrote:
> OK, thanks. Scary, to see errors out of lfs find.
>
> bob
>
> On 11/30/2010 1:47 AM, Andreas Dilger wrote:
>> On 2010-11-29, at 20:18, Bob Ball wrote:
>>> I have an odd problem. I am trying to empty all files from a set of OST
>>> as indicated below, by making a list via lfs find and then sending that
>>> list to lfs_migrate. However, I have just gotten this message back from
>>> the lfs find:
>>>
>>> llapi_semantic_traverse: Failed to open
>>> '/lustre/umt3/data13/daits/p15.6.3.10/prod/W1J_munu216465_simul': No
>>> such file or directory (2)
>>> error: find failed for umt3-OST0021.
>> This may mean that the file was deleted while "lfs find" was running.
>>
>>> On the OSS, I see this but not much else:
>>> LustreError: 5226:0:(ldlm_resource.c:861:ldlm_resource_add()) lvbo_init
>>> failed for resource 9101: rc -2
>>>
>>> Can someone give me an idea of what is wrong here? And what can be
>>> done about it, if anything?
>> This might mean that the file was deleted at the same time the MDS crashed, and the objects were removed but the MDS file was not. It is possible to just delete this file using the "unlink" command - it does not contain any data in any case.
>>
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Lustre Technical Lead
>> Oracle Corporation Canada Inc.
>>
>>
>>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
More information about the lustre-discuss
mailing list