[Lustre-discuss] Distributed Object storage lookup of small files
David Pratt
fairwinds.dp at gmail.com
Tue Aug 11 11:16:32 PDT 2009
Hi Jim. Sure. Is the reason that you are doing this on Lustre then the
fact you already had a large clustered filesystem to work from or is
your lustre cluster dedicated to your image project. I have been
investigating Lustre for a smallish scale filesystem as storage pool
for virtual machines for scalable storage of 10TB+. Your use of lustre
is interesting to me since I use Lucene also and a good part amount of
the data of the virtual machine disk images I will be storing is index
data that I will doing parallel searches across.
Regards,
David
On 11-Aug-09, at 2:40 PM, Jim McCusker wrote:
> On Tue, Aug 11, 2009 at 1:14 PM, David Pratt<fairwinds.dp at gmail.com>
> wrote:
>> Hi Jim. That is pretty cool. See there are more than 300,000
>> records at
>> present. Curious about how this will work when you get into much
>> larger
>> scale with RAM requirement to perform search since this goes up
>> substantially with lucene as number of docs goes up. I have have
>> tended to
>> look at sharding and parallel multisearch as means of horizontally
>> scaling
>> Lucene by breaking into chunks. This approach is interesting and just
>> interested how you anticipate scale and performance with document
>> growth.
>
> We haven't had significant RAM requirements with the numbers of
> documents we have at the moment. Nutch is a more complete solution for
> search that has support for parallel search, and I imagine that there
> are other good ways of doing parallel search. Back when JXTA was still
> something I used it to create parallel distributed search across
> people's desktops with pretty good results. Combining the search
> results can end up taking some work, though.
>
> Jim
> --
> Jim McCusker
> Programmer Analyst
> Krauthammer Lab, Pathology Informatics
> Yale School of Medicine
> james.mccusker at yale.edu | (203) 785-6330
> http://krauthammerlab.med.yale.edu
More information about the lustre-discuss
mailing list