[Lustre-discuss] Distributed Object storage lookup of small files

Tue Aug 11 10:14:54 PDT 2009

Hi Jim. That is pretty cool. See there are more than 300,000 records  
at present. Curious about how this will work when you get into much  
larger scale with RAM requirement to perform search since this goes up  
substantially with lucene as number of docs goes up. I have have  
tended to look at sharding and parallel multisearch as means of  
horizontally scaling Lucene by breaking into chunks. This approach is  
interesting and just interested how you anticipate scale and  
performance with document growth. Many thanks.

Regards
David

On 11-Aug-09, at 12:05 PM, Jim McCusker wrote:

> We have had good performance using Lucene as a search engine in Java
> backed by Lustre (mentioned in a previous email):
>
> http://krauthammerlab.med.yale.edu/imagefinder
>
> The images are in a hashed directory structure that provides O(1)
> access to the image file contents, and the search engine in turn
> serves as a flexible hash table that provides O(1) per search term
> access to keywords, metadata, and full text.
>
> Lucene is available at http://lucene.apache.org and is a joy to work  
> with.
>
> Jim
>
> On Mon, Aug 10, 2009 at 12:11 AM, Pranas  
> Baliuka<pranas at orangecap.net> wrote:
>> Dear Lustre experts/users,
>>
>>
>>
>> I looking for optimal solution of the task:
>>
>> Internet-scale applications must be designed to process high  
>> volumes of
>> transactions.
>>
>> Describe a design for a system that must process on average 30,000  
>> HTTP
>> requests per second.
>>
>> For each request, the system must perform a lookup into a  
>> dictionary of 50
>> million words, using a key word passed in via the URL query string.
>>
>> Each response will consist of a string containing the definition of  
>> the word
>> (10 KB or less).
>>
>>
>>
>> My initial though was using MySQL/Berkeley DB pointing to SAN, but  
>> probably
>> lower level solution would be more affordable.
>>
>> Can I use e.g. QFS storage via Java without DB severer instead. Can  
>> SAN be
>> avoided and local HDDs joined to Lustre system?
>>
>>
>>
>> Task is hypothetical, but would be nice to get feedback from specific
>> technology experts...
>>
>> Some ideas ;)
>>
>>
>>
>> I’ve send similar request to QFS forum and really not sure which  
>> product
>> would fit better. Both works as distributed file systems ... and  
>> both sounds
>> as convenient storage for particular task.
>>
>>
>>
>> Thanks,
>>
>> Pranas
>>
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>
>
>
> -- 
> Jim
> --
> Jim McCusker
> Programmer Analyst
> Krauthammer Lab, Pathology Informatics
> Yale School of Medicine
> james.mccusker at yale.edu | (203) 785-6330
> http://krauthammerlab.med.yale.edu
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss