Sunday, December 25, 2016

In Memory Full Text Search in Ruby

My site rubyplus.com uses elasticsearch.  I want something lightweight for development environment. I was looking for something similar to SQLite for full-text search. I wanted this to be:
  1. Fast
  2. In Memory
  3. No Server Process
Main motivation was a replacement for elasticsearch in development and test environments. I wanted it to be:
  • Zero-configuration - No setup or administration needed.
  • Small code footprint.
  • Simple, easy to use API.
  • Well-commented source code with 100% branch test coverage.
  • Self-contained: no external dependencies.
Picky is the simplest in-memory full-text search library for Ruby. I probably need only a subset of the solar features such as:
  • ranked searching -- best results returned first
  • many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more
  • fielded searching (e.g. title, author, contents)
  • sorting by any field
  • multiple-index searching with merged results
  • allows simultaneous update and searching
  • flexible faceting, highlighting, joins and result grouping
  • fast, memory-efficient and typo-tolerant suggesters
  • pluggable ranking models, including the Vector Space Model and Okapi BM25
  • configurable storage engine (codecs)
Peter Keen has evaluated some of the options available and he says in his blog:

sqlite3 has built-in FTS but I would have to build a bunch of stuff around it.
xapian is a FTS engine but like with sqlite3 I'd have to build stuff.
elasticsearch would work but it's an external process that I'd have to run and it's an awful lot of overhead
Some kind of hosted elasticsearch or solr provider would work, but again lots of overhead and not free and I'm then dependent on their uptime.

Whistlepig is a small text search index. This is written in C. Small as in not very many features and not much code, but the features that are there are perfect for my needs:
  • Full query language
  • In-memory, in-process
  • Arbitrary number of indexes for the same document
Here's a full example of how to index and query a document:

require 'rubygems'
require 'whistlepig'

document = "Hi there"

index = Whistlepig::Index.new "index"
entry = Whstilepig::Entry.new
entry.add_string "body", document

docid = index.add_entry entry

query = Query.new("body", "hi")
result = index.search(query)
assert_equal docid, result[0]


This controller suffers from nested if-else:

http://valve.github.io/blog/2014/02/22/rails-developer-guide-to-full-text-search-with-solr/

Came across lunr.js, a simple javascript library for full-text search in your browser.

Comes with a standalone command-line interface (CLI) client that can be used to administer SQLite databases.
Sources are in the public domain. Use for any purpose.

No comments:

Post a Comment