What is Orchestrate?

Orchestrate makes databases simple by powering full-text search, events, graph, and K/V storage behind a REST API.


Receive Blog Updates

Share:

Diana Thayer

Replicating CouchDB to Orchestrate

06.18.2014
couch

I love CouchDB. It’s a NoSQL database that, like Orchestrate, lets you work with your data over HTTP. Many data sets, like the entirety of NPM‘s metadata and every law in Massachusetts, are stored in CouchDB. With the release of orchestrate-couchdb, you can now replicate CouchDB databases into Orchestrate collections. Why would you do that? Why, maybe you want to take advantage of Orchestrate’s Full-Text Search and Graph queries, which automatically index data for querying. No need to manually write indexes. Let’s see it in action.

Massachusetts Law

Adding comprehensive full-text search to bodies of law enables legal firms and advocacy groups alike to effortlessly uncover information critical to campaigns and clients. For language nerds like me, we can map the evolution of different legal terms over time, witnessing the birth of new concepts as they’re incorporated into law. Today, we’ll just be importing the data itself. (Tomorrow, the world!)

To start, use NPM to install orchestrate-couchdb:

sudo npm install -g orchestrate-couchdb

Then, we’ll add our settings:

export ORCHESTRATE_API_KEY=YOUR_ORCHESTRATE_API_KEY
export COUCHDB_URL="http://macode.org"
export COUCHDB_DATABASE=api

Then, start the daemon!

orchestrate-couchdb

…That’s it! You’re now importing all of Massachusett’s laws into Orchestrate :D

Since it’ll take a while to import all of Massachusett’s laws, let’s talk about how the importer works.

How It Works

CouchDB exposes a feed of every change that’s occurred to each database. Using follow, we just replay those changes onto Orchestrate, so we get both the current state of every document and its version history as refs. Since follow works hard to never die, our daemon can continue to follow changes forever. It doesn’t stop when it runs out of changes to process; it just waits for more. This lets you stay up to date with a CouchDB dataset that’s still accepting reads and writes.

Heroku Integration

The worker I demonstrated above runs on your local machine, but say you want to offload it to another machine so you don’t have to babysit it. So, let’s deploy our worker to Heroku. For this, we’ll need to download the source code:

git clone git@github.com:orchestrate-io/orchestrate-couchdb.git
cd orchestrate-couchdb
heroku create

Then, let’s set our environment variables on Heroku:

heroku config:set ORCHESTRATE_API_KEY=YOUR_ORCHESTRATE_API_KE
heroku config:set COUCHDB_URL="http://macode.org"
heroku config:set COUCHDB_DATABASE=api

Now, let’s push the project to Heroku, and spin up a Dyno for it:

git push heroku master
heroku ps:scale worker=1 web=0

Bam! Your worker is now importing Massachusetts law from Heroku. Any time our source CouchDB adds more documents, our worker will sync them to Orchestrate. Easy, huh?

Next Time

This dataset was, in many ways, small fries. The documents we’re importing from CouchDB have a consistent schema, and there are only 25k of them. Next time, we’ll be importing the entirety of NPM’s metadata, and how to handle when your data targets don’t have consistent schemas.

Lastly, much love to Calvin Metcalf for making Massachusetts law available in a CouchDB database. CouchDB is an amazing piece of tech, and its developer community is even lovelier.

Happy coding!