Purely Functional Data Structures in Clojure: RedBlack Trees
This post is part of a series about Chris Okasaki’s Purely Functional Data Structures. You can see all posts in the series by visiting the functionaldatastructures category in this blog.
Recently I had some free time to come back to Purely Functional Data Structures and implement a new data structure: Redblack trees.
Redblack trees
Redblack trees are a type of selfbalancing binary search tree. Back when I first learned the balancing algorithm used in operations such as insert and delete, I remember it being a particularly tricky one.
Traditionally, redblack trees are implemented destructively  meaning insertions and deletions happen in place. So in imperative languages like C or Java there is a lot of node pointer manipulation floating around, making this algorithm error prone.
This post, as its title implies, will deal with the functional implementation which, besides simpler, is also persistent.
Invariants
A big disadvantage of binary search trees is that they operate poorly on sorted data with O(N) worst case, pretty much becoming nothing more than a linked list.
This is where Redblack trees come in: when inserting/removing new nodes, the tree balances itself thus guaranteeing search times of O(logN). It does so by satisfying two invariants:
 No red nodes can have red children
 Every path from the root to an empty node contains the same number of black nodes
The image below  taken from the book itself  concisely summarises the 4 cases involved in these invariants, starting at the top and then moving counterclockwise.
With this in mind I set out to write the balance function in Clojure.
The code
In the functional version of a Redblack tree all pointer manipulation required in its destructive counterpart simply disappears, rendering the algorithm a lot simpler.
Nevertheless, we’re left with some complexity in the balance function around testing a node’s colour as well as its children’s and grandchildren’s.
My first attempt at writing it started like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 

I was extremely unhappy with this. A lot of boilerplate around bindings and tests. And this is only the first case.
But if you read Okasaki’s implementation of the algorithm in ML  or Haskell  you’ll quickly realise how concise and elegant it is. The main reason for that is pattern matching, something we don’t have builtin in Clojure.
However, Clojure is a Lisp and has a powerful macros system at its disposal. That has given us core.match, a pattern matching library for Clojure.
core.match
Using core.match, I rewrote my balance function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 

If you look closely at the patterns being matched, you’ll see they cater for all 4 cases and allow for both matching and binding in the same expressions. With only a little over double the size of a single case using the previous function, we now have a fully functioning balance implementation. This is better, but I wanted more.
What I really wanted to be able to write is this:
1 2 3 4 5 6 7 8 9 

Unfortunately core.match doesn’t support records/protocols yet. However, while reading core.match’s source code, I found a test that implemented the balance algorithm using a different way of representing a tree.
You see, I went with the map approach because I like how you can ask for the specific keys that represent the parts of the tree, such as :color, :left, :value and :right. But if you’re happy with using positional elements in a vector to represent your tree, our balance function in Clojure becomes this:
1 2 3 4 5 6 7 8 9 

Which is amazingly similar to the protocols/records version above.
I guess one of the lessons here is that by carefully choosing the data structure to represent your problem, your implementation can become substantially simpler.
As usual, all code is on github, with the Redblack tree specific section in this direct link. The project also includes tests.
Until next time.