[Lustre-discuss] Distributed Object storage lookup of small files

Jim McCusker mccusker at gmail.com
Tue Aug 11 10:40:52 PDT 2009


On Tue, Aug 11, 2009 at 1:14 PM, David Pratt<fairwinds.dp at gmail.com> wrote:
> Hi Jim. That is pretty cool. See there are more than 300,000 records at
> present. Curious about how this will work when you get into much larger
> scale with RAM requirement to perform search since this goes up
> substantially with lucene as number of docs goes up. I have have tended to
> look at sharding and parallel multisearch as means of horizontally scaling
> Lucene by breaking into chunks. This approach is interesting and just
> interested how you anticipate scale and performance with document growth.

We haven't had significant RAM requirements with the numbers of
documents we have at the moment. Nutch is a more complete solution for
search that has support for parallel search, and I imagine that there
are other good ways of doing parallel search. Back when JXTA was still
something I used it to create parallel distributed search across
people's desktops with pretty good results. Combining the search
results can end up taking some work, though.

Jim
--
Jim McCusker
Programmer Analyst
Krauthammer Lab, Pathology Informatics
Yale School of Medicine
james.mccusker at yale.edu | (203) 785-6330
http://krauthammerlab.med.yale.edu



More information about the lustre-discuss mailing list