Validation and Internationalization in Clojure With Bouncer & Tower

I released bouncer in April last year and since then it has had a small but steady growth in usage.

So much so that I received some community feedback in the form of emails and pull requests which is simply great!

The latest and most substantial pull request, submitted by Vadim Platonov, added the ability to customise validation messages in anyway you like, as you can see in the section Internationalization and customised error messages of the README.

However the documentation only gives a simple example and while I believe it should show users how this opens up several different usage patterns, I thought I’d expand that in this post and integrate bouncer with the excellent Internationalization library tower to show how validation and I18n could work together in this configuration.

If you’ve never used bouncer before, I’d recommend you have a quick look at the Basic Validations section of the README before continuing.

Setup

The first thing you’ll need to do is create a new leiningen project and add both bouncer and tower as dependencies in your project.clj:

1
2
[bouncer "0.3.1-beta1"]
[com.taoensso/tower "2.0.1"]

Next, in your core namespace - or wherever, really - require both libs and set up a dictionary to be used in the examples:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
(ns bouncer+tower.core
  (:require [taoensso.tower :as tower
             :refer (with-locale with-tscope t *locale*)]
            [bouncer.core :refer [validate]]
            [bouncer.validators :as v]))

(def my-tconfig
  {:dev-mode? true
   :fallback-locale :en
   :dictionary
   {:en
    {:person  {:name {:required "A person must have a name"}
               :age  {:number   "A person's age must be a number. You provided '%s'"}
               :address {:postcode {:required "Missing postcode in address"}}}
     :missing  "<Missing translation: [%1$s %2$s %3$s]>"}
    :pt-BR
    {:person  {:name {:required "Atributo Nome para Pessoa é obrigatório."}
               :age  {:number   "Atributo Idade para Pessoa deve ser um número. Valor recebido foi '%s'"}
               :address {:postcode {:required "Endereço deve ter um código postal"}}}
     :missing  "<Tradução ausente: [%1$s %2$s %3$s]>"}}})

Also, we need someting to validate so go ahead and create a map representing a person:

1
(def person {:age "NaN"})

Customising bouncer

Since 0.3.1-beta1, bouncer’s validate function receives an optional first argument called message-fn. As the name implies, it is a function from error metadata to, most likely, a string. This is our opportunity to use tower’s features to translate our error messages.

In order to accomplish that, we’ll be using this message function:

1
2
3
4
5
6
7
8
9
10
11
12
(defn message-fn
  "Receives a locale, tscope and a map containing error metadata.

  Uses this information to return a I18n'ed string"
  [locale tscope {:keys [path value]
                  {validator :validator} :metadata}]
  (let [tr-key (->> (flatten [path validator])
                    (map name)
                    (clojure.string/join "/")
                    keyword)]
    (with-tscope tscope
      (t locale my-tconfig tr-key value))))

In order to understand the function above, let’s validate the person map using identity so we can inspect the error metadata that will be the third argument to this function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
(validate identity
          person
          :name v/required)

;; [{:name
;;   ({:path [:name],
;;     :value nil,
;;     :args nil,
;;     :metadata
;;     {:optional false,
;;      :default-message-format "%s must be present",
;;      :validator :bouncer.validators/required},
;;     :message nil})}
;;  {:age "NaN",
;;   :bouncer.core/errors
;;   {:name
;;    ({:path [:name],
;;      :value nil,
;;      :args nil,
;;      :metadata
;;      {:optional false,
;;       :default-message-format "%s must be present",
;;       :validator :bouncer.validators/required},
;;      :message nil})}}]

As we’re only validating the name in this case, that’s all we get in the return value of validate. However you can see how we have all sorts of useful information now - hopefully this makes the message-fn code above easier to understand.

Take it for a spin

We’re now ready for a couple of examples, using two different locales. Let’s get on with it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
(validate (partial message-fn :en :person)
          person
          :name v/required
          :age  v/number
          [:address :postcode] v/required)


;; [{:address {:postcode ("Missing postcode in address")},
;;   :age ("A person's age must be a number. You provided 'NaN'"),
;;   :name ("A person must have a name")}
;;  {:age "NaN",
;;   :bouncer.core/errors
;;   {:address {:postcode ("Missing postcode in address")},
;;    :age ("A person's age must be a number. You provided 'NaN'"),
;;    :name ("A person must have a name")}}]

Now let’s get some messages in portuguese, shall we?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
(validate (partial message-fn :pt-BR :person)
          person
          :name v/required
          :age  v/number
          [:address :postcode] v/required)


;; [{:address {:postcode ("Endereço deve ter um código postal")},
;;   :age
;;   ("Atributo Idade para Pessoa deve ser um número. Valor recebido foi 'NaN'"),
;;   :name ("Atributo Nome para Pessoa é obrigatório.")}
;;  {:age "NaN",
;;   :bouncer.core/errors
;;   {:address {:postcode ("Endereço deve ter um código postal")},
;;    :age
;;    ("Atributo Idade para Pessoa deve ser um número. Valor recebido foi 'NaN'"),
;;    :name ("Atributo Nome para Pessoa é obrigatório.")}}]

In case you’re too lazy to do this all from scratch, I created a github repo containing this example ready for you to play with. Go get it.

Conclusion

Hopefully this gives you a little taste of what you can do with the latest version of bouncer. Remember this is but one of many ways you could integrate the library so get creative :)

Once again, thanks to Vadim Platonov for submitting this pull request.

So Long 2013: Year Highlights

I finally got some free time so sit down and write this post. I was travelling with my girlfriend around the state of Victoria, Australia - mostly around the Great Ocean Road. I had an amazing time and it’s a trip I highly recommend anyone do.

As for this year’s highlights, let’s get started:

Clojure

clj-syd - the Sydney Clojure User Group - continues going strong and this is in great part thanks to the amazing community behind it. You can have a look at our wiki page for 2013 to see what we’ve been up to.

A major highlight is the fact that we brought Ambrose Bonnaire-Sergeant - creator of core.typed - to Sydney to give a talk on his creation. You can read a summary of the talk on our mailing list.

On a more personal note I’ve been basically living in Melbourne for the past ~6 months working on a full-time Clojure project. I have learned a lot and can’t wait to share a whole lot with the group once I’m back in Sydney.

Speaking

I was fortunate enough to speak at a couple of really cool events this year as well as a few user group talks. Here they are:

I have also attended Clojure/West in Portland back in March. Great conference, great content and amazing people. I put together some slides with my notes on SlideShare.

Books

2013 was a great year for Clojure as we saw quite a few new books be released during the year. I’ll briefly mention a couple:

Clojure Cookbook

As the authors describe it: “Clojure Cookbook marks Clojure’s entry in O’Reilly’s prestigious Cookbook Series. The book will contain hundreds of real-world problems and solutions, ranging from basic utilities to rich web services to heavy data processing.”

I have contributed three recipes to this book so I’m quite happy to see them make it in. You can buy the early digital edition from O’Reilly’s website.

The book is the main effort of authors Luke Vanderhart and Ryan Neufeld with contributions from the Clojure community around the globe. They’ve put together quite the selection.

Clojure Recipes

Shameless plug of one of #cljsyd’s regulars, Julian Gamble, who’s been working on this for a while. The book should be released at the end of January 2014 and I’m looking forward to it.

Clojure High Performance Programming

I’ve only just started reading it but I already like what I see enough to mention it here.

Content

Here’s the Top 3 posts from this blog in 2013:

Non-tech news

For those who know me my passion for sports - such as martial arts, rock climbing and strength training - is no secret and in 2013 I became a certified personal trainer in Australia. I did this mostly for personal development and to better my own training. The certification is already paying off. It was a great course.

Happy new year! Here’s to an even better one ahead of us!

CUFP/ICFP 2013

I’m sitting in the Lobby of the Hilton in Boston and since my flight back to Australia isn’t for a few hours I thought I’d write my experience report while everything is still fresh in my mind.

CUFP - Commercial Users of Functional Programming - is a Workshop-like conference targeting the practically-minded functional programming community.

As it’s stated on their website, “The CUFP workshop is a place where people can see how others are using functional programming to solve real world problems […]”.

One of the things that make the event special is that it runs together with ICFP - International Conference on Functional Programming - which is an event on the far opposite side of the spectrum with language designers, professors, compiler implementors getting together and thinking about the future of their languages and fields. The diversity of the event is astonishing.

CUFP itself runs for three days and is divided into a traditional conference format day with several talks and two tutorial days.

I was there for all three days and on the last one I delivered my own tutorial about writing macros in Clojure - more on that later.

As far as the talks of day 1 go, someone already did a great job of summarising them. I highly recommend you go read it.

Day 2

In total there were 9 tutorials being offered - two of which made up a two-day Haskell tutorial. That’s the one I decided to attend.

In the instructions, the instructors mentioned that we could use an online Haskell IDE to follow the course should we choose not to install the Haskell platform on our laptops.

I decided to give it a go. The tool is called FP Haskell Center and has been developed by the awesome guys at FP Complete.

It’s important to note this is an online IDE - but the editor isn’t the only thing being offered though. The Haskell Centre offers a complete deployment solution as well - though I didn’t have the chance to play with it.

Back to the tutorial, I used the FP Haskell Centre for day one and it worked great as far as online IDEs go. Compilation and inspecting types do suffer from the round trip over the web and by the end of the day I was feeling a little frustrated with all the waiting. The tool is great and if they offered an offline version, I’m sure the experience would have been improved tenfold.

Day one was taught by Andres Löh from Well-Typed, a Haskell consultancy.

It was full of exercises in the various basics of Haskell such as expressions, functions, IO, pattern matching and even Monads. I had a lot of fun working through them and it reinforced my opinion about Haskell being as practical a language as any other - but with several advantages.

Day 3

Given my frustration with FP Haskell Center being a bit slow I decided to install the Haskell platform on my new laptop and configure emacs with haskell-mode. I was much happier with this setup. haskell-mode has a lot of nifty features that were extremely useful during the tutorial.

The second day of the Haskell tutorial gave way to Simon Marlow, a software engineer from Facebook UK and author of Parallel and Concurrent Programming in Haskell - also available freely online.

Not surprisingly, his half of the tutorial was about Concurrency. We started with several exercises on IO involving more Monadic functions that we hadn’t learned the previous day. We then moved on to study the basic Haskell concurrency constructs and primitives. All very interesting stuff.

If you have the chance to attend a tutorial by these guys, do yourself a favour and go for it.

Day 3 - 2pm onwards

Unfortunately I missed the second half of the second day as I, too, had to deliver my own tutorial/workshop.

Titled “Bending Clojure to your will: Macros and Domain Specific Languages”, the tutorial had participants work their way through several difference exercises aimed at teaching the various nuances of writing macros.

The tutorial has failing tests for all exercises so it’s dead easy to know when you have arrived at a solution - all participants seemed to have had a great time learning this stuff and I even saw a couple of positive tweets about it - I’m really happy with how everything went.

The best tweet though was the one saying that someone from the Haskell tutorial decided to switch to mine half-way through. So I see that as a win :) Here’s said tweet.

I’ve pushed the exercises to github should you want to try them by yourself. Slides are also available though they will likely not make sense out of context.

Enjoy! :)

Purely Functional Data Structures in Clojure: Red-Black Trees

This post is part of a series about Chris Okasaki’s Purely Functional Data Structures. You can see all posts in the series by visiting the functional-data-structures category in this blog.


Recently I had some free time to come back to Purely Functional Data Structures and implement a new data structure: Red-black trees.

Red-black trees

Red-black trees are a type of self-balancing binary search tree. Back when I first learned the balancing algorithm used in operations such as insert and delete, I remember it being a particularly tricky one.

Traditionally, red-black trees are implemented destructively - meaning insertions and deletions happen in place. So in imperative languages like C or Java there is a lot of node pointer manipulation floating around, making this algorithm error prone.

This post, as its title implies, will deal with the functional implementation which, besides simpler, is also persistent.

Invariants

A big disadvantage of binary search trees is that they operate poorly on sorted data with O(N) worst case, pretty much becoming nothing more than a linked list.

This is where Red-black trees come in: when inserting/removing new nodes, the tree balances itself thus guaranteeing search times of O(logN). It does so by satisfying two invariants:

  • No red nodes can have red children
  • Every path from the root to an empty node contains the same number of black nodes

The image below - taken from the book itself - concisely summarises the 4 cases involved in these invariants, starting at the top and then moving counter-clockwise.

With this in mind I set out to write the balance function in Clojure.

The code

In the functional version of a Red-black tree all pointer manipulation required in its destructive counterpart simply disappears, rendering the algorithm a lot simpler.

Nevertheless, we’re left with some complexity in the balance function around testing a node’s colour as well as its children’s and grandchildren’s.

My first attempt at writing it started like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
(defn mk-tree [color left value right]
  {:color color :left left :value value :right right})

(defn balance [tree]
    ;; case 1
    (let [{z :value d :right} tree
          {x-color :color x :value a :left} (-> tree :left)
          {y-color :color y :value b :left c :right} (-> tree :left :right)
          d (:right tree)]
      (if (and (= x-color :red) (= y-color :red))
        (mk-tree :red
                    (mk-tree :black a x b)
                    y
                    (mk-tree :black c z d))
        tree)
        ...))

I was extremely unhappy with this. A lot of boilerplate around bindings and tests. And this is only the first case.

But if you read Okasaki’s implementation of the algorithm in ML - or Haskell - you’ll quickly realise how concise and elegant it is. The main reason for that is pattern matching, something we don’t have built-in in Clojure.

However, Clojure is a Lisp and has a powerful macros system at its disposal. That has given us core.match, a pattern matching library for Clojure.

core.match

Using core.match, I rewrote my balance function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
(defn balance [tree]
  (match [tree]
         [(:or {:left {:color :red
                       :left {:color :red :left a :value x :right b}
                       :value y :right c}
                :value z :right d}

               {:left {:color :red
                       :left  a :value x
                       :right {:color :red :value y :left b :right c}}
                :value z :right d}

               {:left a :value x
                :right {:color :red
                        :left {:color :red
                               :left b :value y :right c}
                        :value z :right d}}

               {:left a :value x
                :right {:color :red
                        :left b :value y
                        :right {:color :red
                                :left c :value z :right d}}})]
         (mk-tree :red
                     (mk-tree :black a x b)
                     y
                     (mk-tree :black c z d))

         :else tree))

If you look closely at the patterns being matched, you’ll see they cater for all 4 cases and allow for both matching and binding in the same expressions. With only a little over double the size of a single case using the previous function, we now have a fully functioning balance implementation. This is better, but I wanted more.

What I really wanted to be able to write is this:

1
2
3
4
5
6
7
8
9
(defn balance [tree]
    (match [tree]
           [(:or (Black. (Red. (Red. a x b) y c) z d)
                 (Black. (Red. a x (Red. b y c)) z d)
                 (Black. a x (Red. (Red. b y c) z d))
                 (Black. a x (Red. b y (Red. c z d))))] (Red. (Black. a x b)
                                                              y
                                                              (Black. c z d))
                 :else tree))

Unfortunately core.match doesn’t support records/protocols yet. However, while reading core.match’s source code, I found a test that implemented the balance algorithm using a different way of representing a tree.

You see, I went with the map approach because I like how you can ask for the specific keys that represent the parts of the tree, such as :color, :left, :value and :right. But if you’re happy with using positional elements in a vector to represent your tree, our balance function in Clojure becomes this:

1
2
3
4
5
6
7
8
9
(defn balance [tree]
  (match [tree]
         [(:or [:black [:red [:red a x b] y c] z d]
               [:black [:red a x [:red b y c]] z d]
               [:black a x [:red [:red b y c] z d]]
               [:black a x [:red b y [:red c z d]]])] [:red [:black a x b]
                                                            y
                                                            [:black c z d]]
               :else tree))

Which is amazingly similar to the protocols/records version above.

I guess one of the lessons here is that by carefully choosing the data structure to represent your problem, your implementation can become substantially simpler.

As usual, all code is on github, with the Red-black tree specific section in this direct link. The project also includes tests.

Until next time.

Clojure, core.async and the Lisp Advantage

Long gone are the days when systems needed to do only one thing at a time.

Concurrency is the word but it often leads to complex code, dealing with locks, mutexes etc…

There are several different abstractions which allows us to both model and think about asynchronous code in a more sane fashion: futures, promises and events/callbacks are but a few of them.

I won’t get into the merits - or lack thereof - of these alternatives in this post but rather focus on a different alternative: Communicating Sequential Processes - CSP.

CSP and Go

CSP isn’t new. It was first described in 1978 by Tony Hoare and languages such as Occam and Erlang stem from it.

It has however gained momentum by being natively supported by the Go programming language.

I haven’t read Hoare’s paper so I’ll use a little bit of what I know about Go’s implementation of CSP.

Go introduced the concept of a goroutine. It looks like a function call:

1
2
3
// doing some stuff...
go myFunction("argument") //does stuff in the background...
//continuing about my business...

What this does is it creates a lightweight process and returns control immediately to the caller.

It is lightweight because it doesn’t map 1-1 to native OS threads.

The reasoning behind it is that creating too many threads can bring your machine (or VM) down due to the amount of stack allocated to each one.

goroutines are cheap to create so you can have hundreds of thousands of them, and the runtime will multiplex them into a thread pool.

The immediate advantage is that it is dead simple to achieve a higher degree of concurrency.

So far it sounds awfully similar to using futures with a pre-configured thread pool and a bit of syntactic sugar. But this is not the end of it.

Communication

goroutines really shine when your several lightweight processes need to talk to each other. This is where a new abstraction comes into play:channels.

Channels are first-class citizens - meaning you can pass them as arguments to functions as well as the return value of functions.

Using them is straightforward:

1
2
3
4
5
6
c := make(chan string)
go func() {
  time.Sleep(time.Duration(5000) * time.Millisecond)
  c <- "Leo"
}()
fmt.Printf("Hello: %s\n", <-c) //this will block until the channel has something to give us 

The code above creates a goroutine from an anonymous-executing function that will, in the background, sleep for 5 seconds and then send the string Leo to the channel c. Since control is returned immediately after that, the call blocks on the next line where it’s trying to read a value from the channel using the <-c statement.

Lisp

“But What does all this have to do with Lisp?” - ah! I’m glad you asked.

It actually has more to do with Clojure - and by extension, Lisp.

Clojure is a modern Lisp for the JVM, built for concurrency. The core team recently released core.async, a new library that adds support for asynchronous programming much in the same way Go does with goroutines.

To highlight the similarities, I’ll show and translate a couple of the examples from Rob Pike’s presentation, Go Concurrency patterns.

Setting the scene

Say you’re Google. And you need to write code that takes user input, - a search string - hits 3 different search services, - web, images and video - aggregates the results and presents them to the user.

Since they are three different services, you wish to do this concurrently.

To simulate these services, Rob presented a function that has unpredictable performance based of a random amount of sleep, shown below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
var (
    Web   = fakeSearch("web")
    Image = fakeSearch("image")
    Video = fakeSearch("video")
)

type Search func(query string) Result

func fakeSearch(kind string) Search {
        return func(query string) Result {
              time.Sleep(time.Duration(rand.Intn(100)) * time.Millisecond)
              return Result(fmt.Sprintf("%s result for %q\n", kind, query))
        }
}

This is one way we could write such function in Clojure:

1
2
3
4
5
6
7
8
(defn fake-search [kind]
  (fn [query]
    (Thread/sleep (int (* (java.lang.Math/random) 1000)))
    (str kind " result for " query)))

(def web   (fake-search "Web"))
(def image (fake-search "Image"))
(def video (fake-search "Video"))

Google Search 2.0

The first example is the simple case: we hit the services concurrently, wait for them to respond and then return the results:

This example is from slide #46.

1
2
3
4
5
6
7
8
9
10
11
12
func Google(query string) (results []Result) {
    c := make(chan Result)
    go func() { c <- Web(query) } ()
    go func() { c <- Image(query) } ()
    go func() { c <- Video(query) } ()

    for i := 0; i < 3; i++ {
        result := <-c
        results = append(results, result)
    }
    return
}

And here’s the Clojure version:

1
2
3
4
5
6
7
8
9
10
11
12
(defn google2-0 [query]
  (let [c (chan)]
    (go (>! c (web query)))
    (go (>! c (image query)))
    (go (>! c (video query)))
    (reduce (fn [results _]
              (conj results (<!! (go (<! c)))))
            []
            (range 3))))

(google2-0 "Clojure")
;; prints ["Video result for Clojure" "Web result for Clojure" "Image result for Clojure"]

The >! operator puts a value into a channel inside a go form. The function then uses <!! to block on the c channel until it gets a value we can use.

Google Search 2.1

This is virtually the same example but this time we do not wish to wait on slow servers. So we’ll return whatever results we have after a pre-defined timeout.

This example is from slide #47.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
c := make(chan Result)
go func() { c <- Web(query) } ()
go func() { c <- Image(query) } ()
go func() { c <- Video(query) } ()

timeout := time.After(80 * time.Millisecond)
for i := 0; i < 3; i++ {
    select {
    case result := <-c:
        results = append(results, result)
    case <-timeout:
        fmt.Println("timed out")
        return
    }
}
return

You’ll notice the use of select here.

select waits on multiple channels and returns as soon as any of them has something to say.

The trick of this example is that one of these channels times out, at which point you get the message “timed out”, effectively moving on to the next iteration and ignoring that slow server(s) response.

We can express the same intent in Clojure as well:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
(defn google2-1 [query]
  (let [c (chan)
        t (timeout 500)]
    (go (>! c (web query)))
    (go (>! c (image query)))
    (go (>! c (video query)))

    (reduce (fn [results _]
              (conj results (first (alts!! [c t]))))
            []
            (range 3))))


(google2-1 "Clojure")
;; prints ["Video result for Clojure" nil nil]

Everything looks the same but we’re using alts!! to wait on the channels c and t (the timeout channel). This is analogous to Go’s select form in that it waits for any channel to receive a value or, in this case, to timeout.

Note the nil values. Those came from servers which did not respond in time and were simply ignored.

Effectively what this means is that each time you run this function you’ll likely get different results, depending on how long the fake-search function takes to run.

Amazing, huh?

The big deal

But here’s the big deal about this: although core.async looks like it’s deeply integrated into the language, it is just a library!

It’s not a separate compiler. It’s not a new language. And it’s not a special version of Clojure.

Since Clojure supports macros - like all Lisps - the core team was able to create the syntax required to easily use core.async. And that’s the beauty of it!

The Lisp advantage, once again.

Clojure’s advantage

Now one thing I haven’t mentioned is that Clojure is particularly well suited for this - and in a way even more so than Go: Clojure is opinionated and favours immutability.

That means that when using channels - and in fact any type of concurrent programming - you can safely share your data structures between concurrent units of work. Since they’re immutable, you can’t shoot yourself in the foot.

One last thing: core.async states as one of its goals Clojurescript compatibility, bringing channel based concurrent programming to the browser. Exciting stuff.

More on core.async

core.async is still in alpha but you are encouraged to take it for a spin. Documentation is still lacking so I recommend you look at:

Also, the full Clojure code I used here can be seen in this gist.

Functional Programmers Unite! LambdaJam Down Under

I hinted at LambdaJam Australia back in my 2012 Highlights post and the dates are fast approaching so I thought appropriate to blog about it once more.

YOW! LambdaJam is a conference targeted at intermediate and advanced functional programmers and is organised in talks, jams and workshops.

  • Talks are the traditional format and are 30 minutes long;
  • Workshops aim to introduce a specific subjects to attendees in great detail and are up to two hours long;
  • Jams are hands-on sessions. Participants will be guided through pre-defined problems around a given subject/technology and encouraged to work through the solutions either by themselves or ideally as small groups. Jams are also 2 hours long;

Sounds pretty amazing, doesn’t it?

If you’re still not convinced, check out the program.

The conference will run for 2 days, May 16-17, in Brisbane. Tickets are on sale.

Clojure

Besides the conference itself, Clojure/core will be giving an Intro to Clojure workshop in Sydney, Melbourne and Brisbane. It’s a great opportunity to learn from the language maintainers.

If you’re attending the conference already, you’re in luck because there is a 20% off discount on the link above if you attend both the conference and the workshop.

In case you’d like to attend the workshops only, you can buy individual tickets for Sydney, Melbourne and Brisbane.

Scala

Tony Morris will also be delivering a Functional Programming in Scala workshop in Brisbane and 20% off combo tickets can be bought from the EventBrite page for LambdaJam.

See you there

Got tickets yet? No? What are you waiting for?

Clojure and ‘Why Calculating Is Better Than Scheming’

Last week while attending Clojure/West in Portland I came across a paper called Why calculating is better than scheming. In a nutshell, this paper is a critique to Abelson and Sussman’s classic textbook SICP - Structure and Interpretation of Computer Programs, used by MIT for many years to teach their introductory programming course.

If you haven’t read SICP, you should. It’s an amazing book. It uses Scheme, a dialect of Lisp, as the vehicle to present fundamental programming concepts.

Philip Wadler - the author of this particular paper - contrasts teaching in Scheme to teaching using KRC and Miranda, pointing out four major features he considers important and lacking in Scheme. They are:

  • Pattern matching
  • A syntax close to traditional mathematical notation
  • A static type discipline and user-defined types
  • Lazy Evaluation

Note: KRC influenced Miranda which in turn influenced Haskell. Their syntax is similiar, so where Wadler used Miranda code snippets in the paper, I’ll be using Haskell in this post.

As an aside, although the paper talks specifically of Scheme, the term Lisp is used quite loosely and could lead the not-so-careful reader to be misled regarding a whole family of languages. Lisps have come a long way and modern dialects - of which I’ll be focusing on Clojure - address many of the concerns raised by Wadler.

Let us begin.

Pattern Matching

Here Clojure, and most - all? - Lisps, are out of luck.

The example used in the paper is that of summing all integers in a list. First in Haskell:

1
2
sum []   = 0
sum x:xs = x + sum xs

Now in Clojure:

1
2
3
4
(defn sum [coll]
  (if (empty? coll)
      0
      (+ (first coll) (sum (rest coll)))))

The question here is this: Which snippet is easier to read/reason about? the Haskell code!

I must confess that I, too, miss pattern matching sometimes. However we can still improve our Clojure version to read nicer on the eyes by using destructuring:

1
2
3
4
(defn sum [[first & rest :as coll]]
  (if (empty? coll)
      0
      (+ first (sum rest))))

And that’s pretty much it. Without proper pattern matching, we can’t get much better than that.

In addition to the Haskell snippet being easier to read, it’s also easier to prove correct by structural induction, as demonstrated in Wadler’s paper.

Note: core.match adds support to pattern matching in Clojure. At the time of this writing, it’s considered “alpha quality”

Data structures

The paper continues to discuss exercise 2-27 from the SICP, where the reader has to write code to represent a binary mobile, which consists of a left and right branch with each branch being a rod of certain length, from which hangs either a weight or another binary mobile.

Translating the Scheme example to Clojure, such a structure is represented using lists, like so:

1
2
3
4
5
(defn make-mobile [left right]
  (list left right))
  
(defn make-branch [length structure]
  (list length structure))

Wadler then contrasts this with the equivalent Miranda code, translated below to Haskell, taking advantage of algebraic data types:

1
2
data Structure = Weight Int | Mobile Branch Branch
data Branch = Branch Int Structure

The first claim is that the Haskell/Miranda data type declaration makes it clearer what the data structure looks like, which is fair.

Also, the compiler can catch errors early on.

However, when writing idiomatic Clojure code, here’s how I’d actually create this structure:

1
2
3
4
5
(defn make-mobile [left right]
  {:left left :right right})
  
(defn make-branch [length structure]
  {:length length :structure structure})

Granted, it’s still not as clear and the compiler can’t validate the shape of our data structure.

This is however cleaner than the previous version and drives home the point that Clojure isn’t limited to lists, having literals for other data types such as the hash maps used in this example.

The second part of this claim is that through custom data types and pattern mathing, extracting values from those structures becomes simpler:

1
2
3
4
totalWeight (Weight w) = w
totalWeight (Mobile l r) = totalWeightBranch l + totalWeightBranch r

totalWeightBranch(Branch d s) = totalWeight s

Once again Clojure can improve things by taking advantage of its builtin data structures and destructuring:

1
2
3
4
5
6
7
8
(defn total-weight [{:keys [left right] :as structure}]
  (if (number? structure)
      structure
      (+ (total-weight-branch left)
         (total-weight-branch right))))
      
(defn total-weight-branch [{structure :structure}]
  (total-weight structure))     

For a language with no pattern matching nor algebraic data types, this snippet is clear, concise and elegant - and a real improvement over the Scheme version discussed in the paper - which was essentially handicapped by the use of lists to simulate ‘structs’.

As far as Clojure goes, this claim ends here: the next point in the paper, about changing from using list to using cons, is rendered moot since we’re using hash maps to represent our mobiles.

Lisp lists are not self-escaping

Creating lists in Clojure goes like this:

1
2
3
(list (list 1 2) nil) ;; ((1 2) nil)

'((1 2) nil) ;; ((1 2) nil)

Both statements above are equivalent, with the second one being clearly more concise.

The claim here is that the fact that you need to either use the list function or quote the form is cumbersome and can be confusing to beginners.

Clojure solves this by providing literals to another data structure - vectors:

1
[[1 2] nil]

Simple and concise - in fact, in idiomatic Clojure code, you’ll rarely see quoted lists where a vector will do.

This is possible because both lists and vectors conform to a higher level abstraction called a Seq, in terms of which most list operations are defined.

This eliminates the two following points mentioned in the paper as it allows a beginner to defer his/her understanding of quoted forms to more advanced lessons/usages.

Programs that Manipulate Programs - the interpreter example

Here Wadler shows a simple grammar for an interpreter in both Miranda and Scheme.

He claims that since Haskell/Miranda have free data types, representing such grammar becomes simpler:

1
2
3
4
5
6
data Term = Var [Char]
          | Lambda Var Term
          | Apply Term Term
          | Closure Env Var Term
type Env = [(Var, Term)]
type Var = [Char]

This is true in that one can easily scan the snippet above and deduce quickly what Term looks like.

Then, by using pattern matching, eval could be implemented like so:

1
2
3
4
5
eval e (Var v) = lookup e v
eval e (Lambda v t) = Closure e v t
eval e (Apply t0 t1) = apply (eval e t0) (eval e t1)

...

I do believe this makes Haskell an excellent choice for writing interpreters and compilers.

However, the flip side is that entering such terms in Haskell is cumbersome. Consider the term below:

(λx.(x x)) (λx.(x x))

This is how to represent this term using the grammar defined above:

1
2
(apply (Lambda "x" (apply (Var "x") (Var "x")))
       (Lambda "x" (apply (Var "x") (Var "x"))))

The strength in Lisp lies elsewhere. Since we have quoted forms, entering a similar term is a lot less verbose and closer to its intended representation:

1
'((lambda [x] (x x)) (lambda [x] (x x)))

This is, of course, at the expense of making eval a more complicated function:

1
2
3
4
5
6
7
8
9
10
(defn eval [e t]
  (cond (variable? t)
        (lookup e (variable-name t))
        (lambda? t)
        (make-closure e (lambda-var t) (lambda-body t))
        (apply? t)
        (apply (eval e (apply-operator t))
               (eval e (apply-operand t)))))

...

The paper leaves out an important advantage of Lisps though:

Because we can write code for our made up language directly in its (almost)abstract syntax tree form, Lisps are the ideal choice when writing Internal Domain Specific Languages.

Lazy Evaluation

Lists

Haskell and Miranda are lazy languages and that yields a lot of power. This claim is more specific to the use of lazy lists - or sequences, streams - and starts off with a snippet that calculates the sum of squares of all odd numbers from 1 up to 100:

1
sum [ i*i | i <- [1..100], odd i ]

What follows in the paper is a not-so-clear snippet of equivalent functionality using Scheme streams.

Clojure features lazy sequences and list comprehensions, making the above Haskell example trivial to write:

1
(sum (for [i (range 100) :when (odd? i)] (* i i)))

If you’re following at home with the original paper you’ll see this is more readable and elegant than the equivalent Scheme example.

Another - also idiomatic - way to write the same expression is by using a combination of map/filter:

1
(sum (map #(* % %) (filter odd? (range 100))))

Deciding which one is clearer is left as an exercise to the reader.

Special forms and lazy evaluation

In this section, Wadler brings another example from SICP where the reader wishes to implement his/her own if form.

As we know, in order to implement our own version of if, we need to use macros. That is because in Lisps arguments to functions are eagerly evaluated.

One might implement it like this:

1
2
3
(defmacro my-if [pred then else]
  `(cond ~pred ~then
          :else ~else))

In Lazy languages, such as Haskell and Miranda, this problem doesn’t occur allowing such functions to be defined without the need for special and/or quoted forms:

1
2
myIf True  t e = t
myIf False t e = e

However this completely dismisses the power of macros which allow you to extend the language in ways no other language allows - as is extensively demonstrated in books such as On Lisp and Let Over Lambda.

As Guy Steele once put it: ”[…] If you give someone Lisp, he has any language he pleases”

Conclusion

Hopefully this post doesn’t come off as trying to invalidate Wadler’s paper - that is not my intention.

While I do think a few of the points discussed are only applicable to the domain in which his paper was written - teaching - they are still valid and worth understanding.

I do however expect to have given you a different perspective on it, showing the strengths of modern Lisps such as Clojure and how it approaches these issues - such as by using its rich set of data structures, literals and techniques such as destructuring.

Purely Functional Data Structures in Clojure: Leftist Heaps

This post is part of a series about Chris Okasaki’s Purely Functional Data Structures. You can see all posts in the series by visiting the functional-data-structures category in this blog.


Last year I started reading a book called Purely Functional Data Structures. It’s a fascinating book and if you’ve ever wondered how Clojure’s persistent data structures work, it’s mandatory reading.

However, all code samples in the book are written in ML - with Haskell versions in the end of the book. This means I got stuck in Chapter 3, where the ML snippets start.

I had no clue about Haskell’s - much less ML’s! - syntax and I was finding it very difficult to follow along. What I did notice is that their syntaxes are not so different from each other.

So I put the book down and read Learn You a Haskell For Great Good! with the hopes that learning more about haskell’s syntax - in particular, learning how to read its type signatures - would help me get going with Puretly Functional Data Structures.

Luckily, I was right - and I recommend you do the same if you’re not familiar with either of those languages. Learn You a Haskell For Great Good! is a great book and I got a lot out of it. My series on Monads is a product of reading it.

Enough background though.

The purpose of this post is two-fold: One is to share the github repository I created and that will contain the Clojure versions of the data structures in the book as well as most solutions to the exercises - or at least as many as my time-poor life allows me to implement.

The other is to walk you through some of the code and get a discussion going. Hopefully we will all learn something - as I certainly have when implementing these. Today, we’ll start with Leftist Heaps.

Leftist Heaps

Leftist Heaps - or trees - are a variant of binary heaps that can be used as priority queues. On top of the standard invariants of binary heaps, it obeys the leftist property:

  • Every node has a rank, which is the distance from its right spine to the nearest leaf
  • A node’s left child has a rank at least as large as its right child

In a nutshell, these are the operations we need to be able to perform on a leftist heap:

  • insert a value into an existing heap
  • merge two heaps
  • find the minimum value in a heap
  • delete the minimum value, returning a new heap

Since the book uses ML/Haskell, it starts with a data type definition for Heaps that exposes these and a couple of other auxiliary functions. I decided to take a stab at writing the solution using Clojure’s protocols and records:

1
2
3
4
5
6
7
8
9
(defprotocol Heap
  (is-empty?   [this])
  (insert      [this v])
  (merge       [this other])
  (rank        [this])
  (find-min    [this])
  (delete-min  [this]))

(defrecord LeftistHeap [rank value left right])

When implementing the algorithms the base case for the recursive solutions will involve dealing with nil values which at first seems like it wouldn’t be a problem. However, protocol functions dispatch on the type of its first argument so what happens if I call the function is-empty? on nil?

Luckily, Clojure allows us to extend a protocol to core types:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
(extend-protocol Heap
  nil
  (rank [_] 0)
  (merge [_ other] other)
  (is-empty? [_] true)

  LeftistHeap
  (is-empty? [this]
    (nil? this))

  (rank [this]
    (:rank this))

  (merge [{val-this :value left-this :left right-this :right :as this}
          {val-other :value left-other :left right-other :right :as other}]
    (cond
     (is-empty? other) this
     (<= val-this val-other) (ensure-leftist left-this
                                             (merge right-this other)
                                             val-this)
     :else (ensure-leftist left-other
                           (merge this right-other)
                           val-other)))

  (insert [this v]
    (merge (->LeftistHeap 1 v nil nil)
           this))

  (find-min [{v :value}] v)

  (delete-min [{left :left right :right}]
    (merge right left)))

Note how I extended a few of the protocol functions to the nil data type, allowing me to continue with this implementation with no nasty hacks.

There’s one last bit missing: a function that will ensure each heap retains the leftist property:

1
2
3
4
5
6
7
(defn ensure-leftist
 [this other v]
 (let [rank-this (rank this)
       rank-other (rank other)]
   (if (>= rank-this rank-other)
     (->LeftistHeap (inc rank-other) v this other)
     (->LeftistHeap (inc rank-this) v other this))))

The reason this function is isolated is that the Heap protocol defined above is fairly generic and could be used for defining other types of heaps - and I didn’t feel it warranted its own interface.

We can now play with it and create a new leftist heap:

1
2
3
4
5
6
7
(-> (->LeftistHeap 1 3 nil nil)
                   (insert 2)
                   (insert 7)
                   (insert 4)
                   (insert 10)
                   (insert 1)
                   (insert 20))

While I quite like this approach, I thought I’d also implement this solution using Clojure’s core data types - maps in this case - and no protocols. The code is shown below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
(defn mk-heap [rank value left right]
  {:rank rank :value value :left left :right right})

(defn heap-rank [heap]
  (if (nil? heap)
    0
    (:rank heap)))

(defn ensure-leftist-heap [value heap-a heap-b]
  (let [rank-a (heap-rank heap-a)
        rank-b (heap-rank heap-b)]
    (if (>= rank-a rank-b)
      (mk-heap (inc rank-b) value heap-a heap-b)
      (mk-heap (inc rank-a) value heap-b heap-a))))

(defn merge-heaps [{val-a :value left-a :left right-a :right :as heap-a}
                   {val-b :value left-b :left right-b :right :as heap-b}]
  (cond
   (nil? heap-a) heap-b
   (nil? heap-b) heap-a
   (<= val-a val-b) (ensure-leftist-heap val-a
                                         left-a
                                         (merge-heaps right-a heap-b))
   :else (ensure-leftist-heap val-b
                              left-b
                              (merge-heaps heap-a right-b))))

(defn heap-insert [value heap]
  (merge-heaps (mk-heap 1 value nil nil)
               heap))

(defn heap-find-min [{v :value}] v)

(defn heap-delete-min [{left :left right :right}]
  (merge-heaps right left))

Using it is equally simple:

1
2
3
4
5
6
7
(->> (mk-heap 1 3 nil nil)
                    (heap-insert 2)
                    (heap-insert 7)
                    (heap-insert 4)
                    (heap-insert 10)
                    (heap-insert 1)
                    (heap-insert 20))

That’s it for now.

As I implement more of the book’s code and exercises I’ll add them to the github repo - it also includes tests for all implementations.

Enjoy :)

Announcing Bouncer, a Validation Library for Clojure Apps

Today I’m releasing bouncer, which was extracted from a project I’ve been working on.

It’s a validation library for Clojure apps and it lets you write code like this:

1
2
3
4
5
(def person {:name "Leo"})

(validate person
    :name required
    :age  [required number])

If you’d like to see more examples and a detailed guide check out the github repository. The README should get you started.

This post however isn’t only about announcing bouncer. It’s also about the motivation and implementation details behind it.

There are a couple of Clojure validation libraries already out there so why would I write a new one? Well…

  • Writing Clojure is fun! (who knew? :P)

  • Because I believe this problem can be solved more elegantly with the use of Monads

If you’ve been following me for a while, you’ll know that I spent most of 2012 deepening my knowledge about functional programming.

In that journey, the unavoidable subject of monads came about - and it was both interesting and enlightening enough that made me write a whole series of posts about it.

After learning what they are and then thinking about the validation problem for a while, I couldn’t help but notice that the problem had a lot in common with the State Monad.

In order to explain how the two relate, I’ll have to digress for a moment. It’ll all make sense in the end - or so I hope

Purity

In pure functional languages, such as Haskell, functions can’t have side effects. These include performing IO, changing global variables or launching missiles.

Because of that, functions in Haskell are pure: if you repeatedly call a function f with the same argument x over time you will always get the same result back.

Pure functions are not a feature of Haskell though. We, too, can write pure functions if we wish:

1
2
(defn double [x]
    (+ x x))

The function double above is pure. If we call it with 10, we can be sure the result will always be 20.

This leaves us with a question though: If we were to write our programs with pure functions only, how would we perform computations that need to carry state - state that needs to change over time - around?

A good example of such computation is generating random numbers.

Most programming languages provide generators capable of creating random numbers on demand. Using them is usually trivial. Here’s an example in Java:

1
2
3
4
Random gen = new Random();

gen.nextDouble(); // 0.0037635726242281065
gen.nextDouble(); // 0.15821091918430885

Impurity alert!

The function nextDouble above is obviously not pure. Multiple invocations of it with the same argument - in this case, none - returns different results.

nextDouble is keeping some sort of global state between function calls.

This is where the State Monad comes in. It allows such functions to remain pure.

The State Monad

The State Monad provides a way to abstract state from the function that needs to operate on it.

Sounds confusing? Hopefully an example will clear things up.

Let’s have a look at the clojure function rand:

1
2
3
4
(rand)
;; 0.04388682005715605
(rand)
;; 0.43057496371080517

rand suffers from the same problem as nextDouble we saw above. It keeps it’s own state that is shared across calls, therefore being an impure function.

Now let’s write a pure version of rand. We’ll call it pure-rand:

1
2
3
4
5
6
7
8
9
10
(def gen (java.util.Random.))

(defn pure-rand [g]
    [(.nextDouble g) g])

(pure-rand gen)
;; [0.5783608063218478 #<Random java.util.Random@7f30ab6>]

(pure-rand gen)
;; [0.9251968987499839 #<Random java.util.Random@7f30ab6>]

This is interesting. pure-rand now takes a generator as an argument and returns a two-element vector containing the random number itself - the result we’re actually interested in - and the generator that was passed in.

Recall however that the generator returned, albeit the same object, is in a new sate.

By re-writing the function like this, we’ve regained purity, as demonstrated below:

1
2
3
4
5
(pure-rand (java.util.Random. 100))
;; [0.7220096548596434 #<Random java.util.Random@bb14fe1>]

(pure-rand (java.util.Random. 100))
;; [0.7220096548596434 #<Random java.util.Random@bb14fe1>]

As you can see, as long as we provide the same generator - the same argument - we get the same result back.

If you’re wondering why I’m returning a two element vector from our little function, the answer lies in the State Monad implementation as found in the algo.monads library.

From its docstring:

State Monad: Monad describing stateful computations. The monadic values have the structure:

(fn [old-state] [result new-state]).

It expects a function that receives its old state and returns the result and the new state - these are called monadic values.

By designing the function to follow this contract, we can leverage the domonad macro - think of it as syntactic sugar for working with monads:

1
2
3
4
5
6
7
8
9
10
11
(require '[clojure.algo.monads :as m])

((m/domonad m/state-m
    [a pure-rand
     b pure-rand
     c pure-rand]
     [a b c])
     (java.util.Random. 100))

;; [[0.7220096548596434 0.19497605734770518 0.6671595726539502] 
;; #<Random java.util.Random@358ddfd6>]

In the example above, we’re using our pure-rand function in the context of the State Monad to generate 3 random numbers - based on some initial state - and returning them as a vector.

As we’ve seen before the result is itself in a vector alongside the new state.

This is where the State Monad and the validation problem meet:

In bouncer, each validation function is designed to be compatible with the State Monad, just like pure-rand above:

It receives an initial state - at first, the map to be validated - and returns a vector with the map of errors and the new state: the original map augmented with any errors from previous validators.

The end result, should one or more validations fail, is a map with all errors that might have happened, plus our new state.

Now if you head to the github repository and read the examples by keeping the State Monad and the above explanation in mind, the similarities should be obvious.

Wrapping up

As I mentioned in the beginning of the article, there are other validation libraries for Clojure and at the time of this writing they have more features than bouncer - by all means have a look at them.

However I will keep maintaining bouncer for a couple of reasons:

  • That’s what I’m using in my current side project
  • It takes a fundamentally different implementation approach that is in itself worthy of exploration
  • If nothing else, this is yet another example of where Monads can be useful.

Acknowledgments

Thanks to Steve and Julian for reviewing early drafts of this post as well as Nick for being such a PITA :) - our discussions led to a considerably nicer design.

As usual, let me know your thoughts.

So Long 2012: Year Highlights

Keeping the tradition of end-of-year blog posts, here are my year highlights for 2012:

Clojure

I started clj-syd - the Sydney Clojure User Group - back in January and it has seen a major uptake. We consistently get an average of 25 people each month and there’s no shortage of talks and topics to discuss. It’s a great crowd and I’m proud to have started it and of being part of it.

Speaking

I spent most of the year studying functional programming and I tried to make my talks reflect that. Even the javascript talk I gave in April had a FP focus. Anyway, you can find some of the slides online:

Books

As usual I read a number of books this year and these are my favourites:

  • The Little Schemer

As they say, an oldie but a goodie. This is mandatory reading for anyone who consider themselves a serious programmer. The insights and techniques found here are invaluable. Do yourself a favor and buy it now.

For a Clojure take on it, check Julian Gamble’s series on The Little Schemer in Clojure

  • The Joy of Clojure

This is pretty much how I learned Clojure. I actually started reading it in November 2011 but felt I needed to put it down from time to time to go and code on my own to solidify what I learned. That’s why I only finished it in 2012.

This book packs a great deal of info and is a great way to get into both functional programming and Clojure.

  • Learn you a Haskell for Great Good

This is the best intro to Haskell ever. Hands down. ‘nuff said.

  • Clojure Programming

I like reading different books about the same subject to get to see it from various perspectives. Clojure Programming takes a different approach to The Joy of Clojure in that, as it progresses through topics and code examples, it contrasts it with languages likely to be familiar to the reader, such as Java, Ruby and Python. Another great alternative if you’ve been thinking of getting into Clojure but didn’t know where to start.

Content

Here’s just a few of the posts visitors found most interesting this year:

LambdaJam

LambdaJam is a conference for functional programmers and although it’s set to happen in May 2013 the wheels have already started turning - thus I had to list it as a highlight.

I’m fortunate enough to be involved with the organization alongside very bright people from the Australian functional programming community as well as Dave Thomas from the YOW! Conference who’s leading the effort. We’re all really excited about it.

Stay tuned, lots more info to come!

Goals

No resolutions. Goals. And I wanna achieve them. They’re simple and I hope this post will keep me in check

Better utilize my free time

This is a bit abstract but will help me achieve the goals below. The output of this goal is a calendar with time slots assigned to each one of the next goals - I’ve just done this now actually so this is a good start. No more procrastinating.

Books

I got a big pile of books I wanted to have read in 2012 so I’ll be updating my good reads profile with them and make sure I just do it - with the help of my new calendar from the goal above.

Side projects

In 2012 I worked on two side project which are not ready yet, so I can’t comment a whole lot on them apart from saying that one is being developed in Clojure and the other in Ruby - in 2013, these two will see the light of day.

Here’s to a new year

I hope you all had a great 2012 and are as excited as I am for the new year.

Happy holidays, safe driving and awesome parties! :)