Posted by – April 3, 2008

After few days reading Lucene and Solr mailing lists archives I discovered a large range of frameworks and tools related with information retrieval and natural language processing, at first moment we can find the most important search engines components at Lucene site as Lucene subprojects, but there are several other initiatives related with this topic.

I really liked Solr, he has a lot of cool features that remembers some of the features found in FAST ESP, but one of the features that I really missed was a kind of document processing pipeline where I can execute several small programs that perform some operations on the document before send it to the indexer. This issue can be solved by using OpenPipe integrated with Solr, the main idea behind OpenPipe is provide a document pipeline very similar to FAST ESP version but much more easier to configure and mantain, this project is quite new but promissing from my perpective.

If you need linguistics features like sentiment analysis, pos tagging, entity extraction in your search solution then LingPipe can solve your problem, LingPipe is free for applications that will be available for free too, but if you use it for comercial purposes a license must be purchased.

But in the end we have a big problem, package all these components/frameworks together offering a complete open source search solution. I hope to see something like this being true in a near future.


