Back to top
Discuss  August 17, 2017

Using Riak with Elixir

Now that Basho is no more, people have been asking what will happen to Riak. Now I don’t presume to know but I will say that it is an OSS project and that there are several engineers which continue to work with it while some are willing to be paid to support it. (Disclaimer, I am one).

The Basho Docs may (or may not) still be alive when you read this. Although the community has suffered some blows over the years it still exists and at least one person has decided to host the docs at a alternative location. With anything OSS life is uncertain. This is why it is important to make sure that people who work on something get paid if what they work on is of value to you. This article isn’t about economics or markets however. I’d like to write about using Riak with Elixir today.

Using Riak within an Elixir project is super easy but occasionally I bump into dependencies which are not updated or have warnings I’d rather not live with. I recently ran into a few of those and I expect until someone decides to come along and fund Riak work then things may continue to break. Although Riak has built in anti-entropy it wasn’t built to maintain itself against code rot as OTP releases march forward.

I recently took a few hours and updated the Elixir Riak Client and Pooler and pushed them up to This adds Hyper Log Log support in the Elixir client and fixes some issues with pooler which were preventing compilation under under OTP 20.

While I am here, I will just quickly discuss setting up a Mix project to work with Riak. Having worked on many consulting projects using Riak I have found that there are several things I desire when working with Riak on a long term basis.

Integrating Riak into Mix

I prefer to create a umbrella OTP app called db. I do this because I want to group up all the data models which will be stored to Riak. I also want to have a nice dedicated place to add the dependency information which only directly relevant to KV storage.

In my mix file, I add this to get started:

def application do
    applications: [
      mod: {DB, []}

defp deps do
    {:tzdata, "~> 0.5.12"},
    {:geocalc, "~> 0.5.4"},
    {:riak, "~> 1.1.6"},
    {:timex, "~> 3.1"},
    {:pbkdf2, "~> 2.0.0"},
    {:csv, "~> 2.0.0"},
    {:uuid, github: "okeuday/uuid"}

These libraries are fairly common across several use cases I have bumped up into in the wild. This also starts up the DB application.

Environment Namespaceing

Separating configuration by environment is really good to do from the get go. It makes sure that any development, test, or production data are logically and cleanly separated from each other within Riak. You will see later that I will tie the Mix.env variable as my Riak bucket prefix.

Each OTP app has its own configuration file under config/config.exs. I like to add a import for a mix environment specific file by adding this to that file:

import_config "#{Mix.env}.exs"

The overall (top level) umbrella application config file contains generic top level application configuration for ssl, logging, timezone data storage, uuid caching, etc. I also add two lines at the bottom of the file which ensure that local configuration will override any configuration settings defined elsewhere. It is important that these go at the end of the file like so:

config :ssl, protocol_version: :"tlsv1.2"

config :logger,
  backends: [
    {FileLoggerBackend, :error_log},
    {FileLoggerBackend, :access_log}],
  utc_log: true,
  compile_time_purge_level: :debug,
  truncate: 4096

config :logger, :access_log,
  path: System.cwd <> "/log/access.log",
  metadata: [:function, :module],
  level: :info

config :logger, :error_log,
  path: System.cwd <> "/log/error.log",
  metadata: [:function, :module],
  level: :error

# if a process decides to have a uuid cache
config :quickrand,
  cache_size: 65536

# prevent exometer from creating spurious directories
config :setup,
  verify_directories: false

# configure tzdata to autoupdate and use a data dir
config :tzdata, [
  autoupdate: :enabled,
  data_dir: "./data"]

import_config "../apps/*/config/config.exs"
import_config "*local.exs"

I use a file logger backend which I just found via google. It is based off of GenEvent, and seems to work ok. I could probably dig into it to make it better or I could just try to integrate lager some day if I can figure out how not to drag in dependencies I may not want to have. That could be something to write about later.

Next up, lets configure the Riak connection pool.

Inside of apps/db/config/dev.exs we add

config :db,
  unit_separator: "_"

config :pooler,
  [pools: [
    [name: :riaklocal1,
     group: :riak,
     max_count: 5,
     init_count: 2,
     start_mfa: {Riak.Connection, :start_link, ['', 8087]}]

Pooler can have more than one pool should you want to build in some redundancy. For production you will probably have each pool point at a load balancer so only one may be needed. Take care with your HAProxy config, I’ve seen some configurations disconnect clients at wrong intervals so getting the timeouts wrong which will cause intermittent disconnects.

From that point on if network connectivity to the protocol buffers port in Riak isn’t a problem (all the ports are open) then you should be able to just fire up iex to validate the connection to Riak:

For fun, go ahead and stop Riak. Watch the logs. Pooler will freak out and start to watch for reconnection. When you start Riak back up you will see it calm down and return to normal operation. This is what I love about erlang and elixir, this stuff is usually built in to handle failures without crashing everything.

From here, we just need to create a data model.

KV Data Models

Most of the database world thinks of data models as either chunks of tabular data that can be joined and manipulated with SQL. Riak is technically a distributed Key-Value store so; no SQL for you. So here is what we are going to do.

Somewhere in a common module create a few attributes that look like this.

@unit_separator Application.get_env(:db, :unit_separator, "\x1f")
@prefix "#{Mix.env}"

These make sure the prefix and unit separator is accessible in one spot to the codebase. I use underscores for dev and test, but the unit separator character (\x1f) for prod. I don’t like it when someone tosses an underscore into their keyspace and I have to deal with it in prod. Since no one uses the latter, it prevents a bunch of headaches down the road. (I actually had a difficult migration once because of this).

Next up, lets define what any code that touches a KV Data Model is going to pass up the stack. I want something from Riak (r) which can be turned into JSON. Here is the type spec I like:

@type r_json_t :: %{
  required(model: atom) => map(),
  optional(vtag: atom) => String.t,
  optional(last_modify_time: atom) => DateTime.t,
  optional(vclock: atom) => String.t

A model is a kind of flat structure that can be stored as a value using a key. It is a map (actually a struct) which can be turned into a predefined structure later.

@spec unit_separator() :: String.t
def unit_separator do

@spec namespace(String.t) :: String.t
def namespace(bucket) do

This is where I allow other modules know about and understand both the separator and prefix namespace strategy. Sometimes it can be handy for data migrations to get at this so Riak libraries can be called directly without all the convenience functions found within the application. If every data model uses this common code then all data models will be appropriately by environment.

I like my data models to include some basic thing always. This function makes sure to translate last modify time to the r_json_t map. For kicks it also adds vclock and vtag.

Again type is a struct and Poison is used to decode riak object data into the r_json_t model.

@spec add_db_attrs(%Riak.Object{}, map()) :: DB.Common.r_json_t
def add_db_attrs(%Riak.Object{} = r_object, type) do
  meta = r_object.metadata
  {_, vtag} = :dict.find("X-Riak-VTag", meta)
  {_, {mega,seconds,micro}} = :dict.find("X-Riak-Last-Modified", meta)
  unix = (mega * 1000000 + seconds) * 1000000 + micro
  {:ok, time} = DateTime.from_unix(unix, :microseconds)
  %{model: Poison.decode!(, as: type),
    vtag: to_string(vtag),
    last_modify_time: time,
    vclock: Base.encode64(r_object.vclock)}

This next function usually should be called from a model’s save function. The save function of a kv data model is also a good place to add to the overall count of things that model represents using a HLL datatype. Operations such as that and indexing can typically be done asynchronously once the primary key has been saved.

@spec creation_time(Model.t) :: integer
def creation_time(data_model) do
  case is_nil(data_model.creation_time) do
    false ->
      case is_binary(data_model.creation_time) do
        true ->
          {_, dt, _} = DateTime.from_iso8601(data_model.creation_time)
          DateTime.to_unix(dt, :microseconds)
        false -> data_model.creation_time
    true -> DateTime.to_unix(DateTime.utc_now() , :microseconds)

Finally I like to try to have each model implement a bunch of CRUD functions. Callbacks for these are handy and much of the boilerplate stuff can be shoved into a common module that they all share. Maybe if I liked macros I would use those in elixir, but I am not really a fan of the metaprogramming concept. Specific model concerns such as reverse term indexing and constraint checking can be kept in the specific KV data model modules.

Each individual kv data model should use the @behaviour Model which looks like this:

alias DB.Model

defmodule DB.Model do
  @type t :: module()

  # KV Mapping
  @callback bucket() :: String.t
  @callback hll_bucket() :: String.t

  # Crud
  @callback new() :: Model.t
  @callback save(Model.t) :: Model.t | :error
  @callback delete(String.t) :: :ok | :error
  @callback find(String.t | list(String.t), boolean()) :: Common.r_json_t | list(Common.r_json_t) | :not_found | :error

  @spec is_implemented?(module()) :: true | false
  def is_implemented?(module) do
    |> Keyword.get(:behaviour, [])
    |> Enum.member?(__MODULE__)

That is pretty much it. I am sure there are missing parts here that will prevent something from working flawlessly. If so you can leave me some Feedback and I’ll try to correct this post.

A few things I could cover in the next post are:

  1. HAProxy configuration when using Riak
  2. Runtime maintenance of the Riak Connection Pool.
  3. Building Data Models & Namespacing Buckets
  4. Handling Errors Right
  5. Integration to Lager
  6. Cowboy 2, Gun, and Websocket Integration / Testing

Please provide feedback and let me know what you think and what you would like to hear more on next!

Discuss  February 15, 2014

Fun With Data & Maps

This actually happened in July of 2013, and I finally got around to learning more of the parts, storing the actual map data in Riak somewhere, and hosting the map bits on real servers so I could see how to properly deal with CORS and CSRF related issues.

The hackathon was focused on using Chigaco city data. During some downtime (my job was to help the attendees with technical questions), myself and another coach grabbed a bunch of the data people were working with and started seeing what we could do with it. After watching my partner convert the data from tabular format into JSON, and noticing geo spacial coordinates within, we decides to just make it into GeoJSON.

I then grabbed angular js, leaflet.js, marker cluster, and open street map.

A few ten’s of minutes later, I had mashed these together with the data the city gave us, and ended up with the simple map control below. Afterward, we had a few conversations with people on the Internet of Things, that left me feeling like I should know the client side application stuff better.

I particularly like the marker cluster combination effect at max zoom levels. This particular data set shows all the green spaces in the city. With a bit more effort it may be possible to reduce the amount of data pulled across if I can map it to access patterns that make sense for the changing zoom levels.

Will have to think & learn more on this.

Discuss  February 14, 2014

Happy Valentines Day!

Next week I am presenting a talk on some of the new features within the upcoming Riak 2.0 release as well as something currently called Riak JSON.

Wait, What is Riak?

Riak 2.0

Riak is a open source distributed database which is made by the company I work for. Riak has been around for a while and there are a number of great introductions found all over the internet.

If you think you know me, you had best learn what Riak is and how to use it if you want to continue being friends.

I kid.

Ok, so what is new in Riak?

Lots of things! Riak 2.0 comes with better search capability, new built in distributed data types, options for stronger consistency, and improved configuration management for deployments in the cloud.

Of all the features, I am most excited about both Search 2.0 and the new built in Conflict-free replicated data types otherwise known as CRDT. These are already documented, and I may write about these later.

Great, so what is Riak JSON?

Riak JSON is a open source document query interface which is built on top of Riak 2.0 that uses Solr to index document data.

Riak JSON focuses on JSON documents. Why JSON? Because (lets be honest here) it is currently gaining in popularity and people want to use it. JSON is now popping up all the time for me, and I want better tooling to help my work be more efficient (while still being correct) while working with distributed databases.

Are you not like me and have a specific serialized format you actually care about? Riak JSON will show you how to build out such a specific interface on top of Riak, or you can use the lower level Solr / Yokozuna API directly which is underneath Riak JSON. This is also something we (at Basho) can help you with.

Diving In

Ok, so to get started with Riak JSON we start off with the branch of Riak that has everything put together. If need a pre-built package please yell at me (nicely) on twitter and I may accomodate you. Riak JSON will work on any Riak 2.0 install.

Note, the content below is the same material within the the slides prepared for the talk.

Building from source:

git clone
git checkout ack-riak-json
make rel
make devrel

Enable both search and riak json within the etc/riak.conf file.

search = on
riak_json_http = on

Fire up Riak

ulimit -n 4096
bin/riak start
bin/riak ping

The Java client has the following dependencies to deal with the HTTP, serialization to JSON, and logging concerns:

  • Apache HTTP Client
  • Jackson
  • SLF4J

Using the Java / Scala library:

import com.basho.riak.json._
import com.basho.riak.json.Field.Type._

import scala.beans.BeanProperty
import com.fasterxml.jackson.annotation.JsonIgnore;

val client = new Client("localhost", 10018)
val collection = client.createCollection("squares")

Define a document class:

class MySquare (l:Int, w:Int) extends Document {
  def this() =  this(0, 0)
  @BeanProperty var key: String = _
  @BeanProperty var length: Int = l
  @BeanProperty var width: Int = w

  /* don't serialize this tuple */
  @JsonIgnore def getSize = Tuple2(length, width)

Define a schema:

val schema = new Schema.Builder()
  .addField(new Field("length", INTEGER).setRequired(true))
  .addField(new Field("width", INTEGER).setRequired(true))
  .addField(new Field("owner", STRING))


Add some data, riak will generate the keys for you if you wish, or you may assign your own.

val large_square = new MySquare(1024, 768)
val normal_square = new MySquare(640, 480)

Query by key:

val result = collection.findByKey("9tT49FHJoQImObmPYgVPRcB56T2", classOf[MySquare])
result.getLength() => 1024

Search! Find One Thing:

val q_string = "{\"length\": {\"$gt\": 300}}"
val query = new Query(q_string, classOf[MySquare]);
val single_result = collection.findOne(query)

Find ALL the things!

val many_results = collection.findAll(query)

// documents are a java Collection
many_results.getDocuments().size() => 2

// info about the result set
many_results.numPages(); => 1 -- total pages in result set
many_results.getPage(); => 0 -- current page (zero-indexed)
many_results.perPage(); => results per page, defaults to 100

// extract into a more malable scala list if needed:
  new Array[MySquare](many_results.getDocuments.size)

Requery with: "$per_page":10 and "$page":1 to control the pagination set.

Lastly, you can query Riak using a few different approaches:

KV Style:


Riak Search 2.0 (Solr / Yokozuna) Style:


And finally cleanup:



A key interest of mine is making Riak more developer friendly, and while there will continue to be gaps as we iterate, I feel that the 2.0 is a step in the right direction to achieve this.

Basho Technologies

Discuss  February 13, 2014

I recently re-discovered my github user page and looked upon it in horror. If you somehow saw this atrocity, please try to unsee it.

Long ago, I ran my own web server at home and spent a lot of time doing a mix of varied consulting work that involved both the front and back end. After college, I focused more on the backend aspect of software development and forgot there was such a thing as HTML or CSS. In a big company, that is perfectly fine, there is always someone else besides me that does the UI job, usually using a nice editor that I would simply not shell out the money for unless I did it every day.

As this is the first time I have written anything public for a while that is not on twitter or facebook, things you can expect to find on here are my thoughts on things that interest me. I have been a long time believer in doing projects like this on the side, it helps me keep a whole host of skills fresh.

Interests I have are:

  1. Written Language (English or 한국어)
  2. Making useful things using …
  3. Erlang
  4. Java / Scala
  5. C / Objective C
  6. Ruby
  7. Arduino
  8. Raspberry PI

If I add more functionality, I may write more about random thoughts around the latest Peter Balis papers, but only once there is a mechanism on site to interact with all of you. You can always find me on the twitterz.

Until then, I hope you find what you are looking for.

Copyright © 2017 >