At Zemanta we love Apache Lucene/Solr. Our recommendation engine is built upon it and in our experience, Lucene/Solr is a very fast and solid piece of software. While our index is not that big (several tens of milions of documents), our queries are quite complex. Since our users expect their recommendations returned within a couple of hundredths of milliseconds, our greatest scalability issues are response times. For now, we have avoided the need for sharding by implementing a custom extension to Lucene/Solr that enables us to search the index using multiple cores, where each of the cores processes different part of the index simultaneously. This solution provided us with shorter response times without the need for dealing with index partitioning. But our index is growing faster than the number of cores we have available in our servers. Therefore we plan to start using SolrCloud this year, so that we continue to provide fast response times while being able to greatly increase the pool of news articles and blog posts that we can recommend to our users.
Just the other day we wrote about Sensei, the new distributed, real-time full-text search database built on top of Lucene and here we are again writing about another "new" distributed, real-time, full-text search server also built on top of Lucene: SolrCloud.
- JTeam Partners with New Relic to Provide Application Performance Management for Solr Enterprise Search Server (prweb.com)
- Lucene Revolution 2012 - Call for Participation now open! (lucidimagination.com)
- Evaluating Search Platforms, Choosing Solr (snougher.wordpress.com)