[Lustre-discuss] Distributed Object storage lookup of small files

David Pratt fairwinds.dp at gmail.com
Tue Aug 11 11:16:32 PDT 2009


Hi Jim. Sure. Is the reason that you are doing this on Lustre then the  
fact you already had a large clustered filesystem to work from or is  
your lustre cluster dedicated to your image project. I have been  
investigating Lustre for a smallish scale filesystem as storage pool  
for virtual machines for scalable storage of 10TB+. Your use of lustre  
is interesting to me since I use Lucene also and a good part amount of  
the data of the virtual machine disk images I will be storing is index  
data that I will doing parallel searches across.

Regards,
David


On 11-Aug-09, at 2:40 PM, Jim McCusker wrote:

> On Tue, Aug 11, 2009 at 1:14 PM, David Pratt<fairwinds.dp at gmail.com>  
> wrote:
>> Hi Jim. That is pretty cool. See there are more than 300,000  
>> records at
>> present. Curious about how this will work when you get into much  
>> larger
>> scale with RAM requirement to perform search since this goes up
>> substantially with lucene as number of docs goes up. I have have  
>> tended to
>> look at sharding and parallel multisearch as means of horizontally  
>> scaling
>> Lucene by breaking into chunks. This approach is interesting and just
>> interested how you anticipate scale and performance with document  
>> growth.
>
> We haven't had significant RAM requirements with the numbers of
> documents we have at the moment. Nutch is a more complete solution for
> search that has support for parallel search, and I imagine that there
> are other good ways of doing parallel search. Back when JXTA was still
> something I used it to create parallel distributed search across
> people's desktops with pretty good results. Combining the search
> results can end up taking some work, though.
>
> Jim
> --
> Jim McCusker
> Programmer Analyst
> Krauthammer Lab, Pathology Informatics
> Yale School of Medicine
> james.mccusker at yale.edu | (203) 785-6330
> http://krauthammerlab.med.yale.edu




More information about the lustre-discuss mailing list