Purely Functional Data Structures in Clojure: Red-Black Trees
Red-black trees are a type of self-balancing binary search tree. Back when I first learned the balancing algorithm used in operations such as insert and delete, I remember it being a particularly tricky one.
Traditionally, red-black trees are implemented destructively - meaning insertions and deletions happen in place. So in imperative languages like C or Java there is a lot of node pointer manipulation floating around, making this algorithm error prone.
This post, as its title implies, will deal with the functional implementation which, besides simpler, is also persistent.
A big disadvantage of binary search trees is that they operate poorly on sorted data with O(N) worst case, pretty much becoming nothing more than a linked list.
This is where Red-black trees come in: when inserting/removing new nodes, the tree balances itself thus guaranteeing search times of O(logN). It does so by satisfying two invariants:
- No red nodes can have red children
- Every path from the root to an empty node contains the same number of black nodes
The image below - taken from the book itself - concisely summarises the 4 cases involved in these invariants, starting at the top and then moving counter-clockwise.
With this in mind I set out to write the balance function in Clojure.
In the functional version of a Red-black tree all pointer manipulation required in its destructive counterpart simply disappears, rendering the algorithm a lot simpler.
Nevertheless, we’re left with some complexity in the balance function around testing a node’s colour as well as its children’s and grandchildren’s.
My first attempt at writing it started like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
I was extremely unhappy with this. A lot of boilerplate around bindings and tests. And this is only the first case.
But if you read Okasaki’s implementation of the algorithm in ML - or Haskell - you’ll quickly realise how concise and elegant it is. The main reason for that is pattern matching, something we don’t have built-in in Clojure.
However, Clojure is a Lisp and has a powerful macros system at its disposal. That has given us core.match, a pattern matching library for Clojure.
Using core.match, I rewrote my balance function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
If you look closely at the patterns being matched, you’ll see they cater for all 4 cases and allow for both matching and binding in the same expressions. With only a little over double the size of a single case using the previous function, we now have a fully functioning balance implementation. This is better, but I wanted more.
What I really wanted to be able to write is this:
1 2 3 4 5 6 7 8 9
Unfortunately core.match doesn’t support records/protocols yet. However, while reading core.match’s source code, I found a test that implemented the balance algorithm using a different way of representing a tree.
You see, I went with the map approach because I like how you can ask for the specific keys that represent the parts of the tree, such as :color, :left, :value and :right. But if you’re happy with using positional elements in a vector to represent your tree, our balance function in Clojure becomes this:
1 2 3 4 5 6 7 8 9
Which is amazingly similar to the protocols/records version above.
I guess one of the lessons here is that by carefully choosing the data structure to represent your problem, your implementation can become substantially simpler.
Until next time.