Announcing the Sydney Clojure User Group

Update: We now have a Meetup.com page. Head over there to learn about our next meetups as well as to RSVP to them. We’ll discontinue usage of the wiki for registering attendees, in favor of the new site. Everything else on this post however still holds.

If you’ve been following my blog, you’ll have noticed I started running internal Clojure meetups/hack nights at ThoughtWorks here in Sydney a while ago. While being closed, we’ve already had one international speaker - Ola Bini - come and share his experience as an attendee at Clojure/conj. I promise I’ll blog about it soon.

We’re a small but active group, eager to learn and as a natural result we couldn’t stay closed for any longer! :)

That’s why last Tuesday, the first time we had external guests, we decided to make the group public so anyone could join.

It was great having people from outside ThoughtWorks sharing their real world experience with Clojure. One of them, Harry Binnendijk from Whoto, even gave a presentation about how and why they’re using Clojure [1].

There was another talk, by yours truly, on Continuation Passing Style and Macros in Clojure. It was a great night.

Interested? Read on.

Our meetings will be held on the 3rd Tuesday of every month - at least to start with.

To help organize things we have a Google group where we’ll announce the next meetings, do the traditional “call for papers”, share ideas on how to improve our group, job posts and anything related to Clojure/FP.

We also have a wiki where you can see when and where the meetings are happening, sign up to talk/attend etc…

So if you’ve been thinking about learning Clojure but haven’t had the chance yet, come join us. Joining a group like this in its beginning is a perfect learning opportunity.

And if you’re a Clojure hacker already, we’d love to hear from you just as much!

We welcome all levels and encourage everyone to submit talks they’re passionate about.

I hope to see you in the next meet up. Feel free to shoot the group an email or to me directly at leonardoborges(dot)rj(at)gmail(dot)com

And don’t forget to add us to your calendar so you don’t miss out - next meetup will be on Feb 21st at ThoughtWorks Sydney!

0

[1] Link to the presentation will be available soon

So Long 2011: Year Highlights

It’s that time of the year to look back at what you’ve done and either be happy about it or… well not. :)

To me anyway 2011 was a great year but I won’t bore you with the details so I’ll just jump straight into a couple of highlights!

The biggest one of course was the project we developed to help the Queensland flood victims back in January - which brings me to the next topic:

Speaking

Said project gave me the chance to speak at a few events, sharing this and other great stories:

(1) - XConf is an internal technical conference here at ThoughtWorks

Content

Here in the blog, these are a few of the most popular articles for 2011:

Languages

I decided to learn Clojure and started an internal user group at ThoughtWorks Sydney - you can read a brief report on our first meetup here.

The group’s been a bit off lately since I was on leave in Brazil and then the whole holiday season happened but it’s definitely one of the things I want to make happen more often in the new year.

Clojure is a great language and I expect it to keep gaining a lot of traction in 2012.

And I guess that’s it…

A lot more happened in the last 12 months but these are definitely my favourite parts.

Happy new year everyone! I’m sure 2012 has a lot of new challenges and surprises for us :D

Help Keep RottingNames on the Appstore

A year ago I decided to try my hand on iPhone software development and created a very simple application to experience the whole lifecycle of putting a complete project on the Apple AppStore - I wrote about it a while back.

The result was RottingNames, a heavy metal band name generator.

Since then, I haven’t developed anything else for the iPhone - mostly because my interest lies elsewhere now - but I did notice a small yet steady amount of downloads for this cool little app.

You can download RottingNames for free from the App Store and I like keeping it that way.

However, to be able to put an application on the AppStore, Apple charges an annual fee of A$99 - even if the application itself is free.

I had decided not to renew my iOS Developer Program membership but it seems a pity to let RottingNames vanish like this. Es pecially knowing there are a few die-hard fans out there having a good laugh with it.

Here’s the deal: I’m asking for A$99 in order to renew the membership.

In return, RottingNames will stay on the AppStore for another year, for free.

So hit the button below and stay metal! \m/

Click here to lend your support to: Help keep RottingNames on the AppStore and make a donation at www.pledgie.com !

RubyConf Brazil 2011

I was in São Paulo last week - the 3rd and 4th of November - for RubyConf Brazil. For those who don’t know, RubyConf Brazil is the evolution of Rails Summit Latin America, where I had the privilege to speak in 2009.

This year though, all bets were off. Fabio Akita and Locaweb put together a great event, with over 700 attendees and about 30 speakers split in two streams and two awesome days.

Once more I had the chance to speak at the event and different from previous years, all talks were recorded and streamed live! You can check mine - in portuguese - here as well as my slides - in english - on slideshare.

All other talks can be watched from the eventials’ RubyConf Brazil page. Locaweb has also published the official Flickr album of the event.

It was great to meet up with great friends again as well as to meet amazing new people such as the guys from ThoughtWorks Brazil. I promise I’ll make the detour next time! :)

I hope to see you all again soon!

Report: Clojure Meetup #1

Last Tuesday we held ThoughtWorks’ Australia first Clojure meetup here in Sydney. It was a lot of fun so I thought I’d share a few words about it.

The format was rather simple. First, we had a brief introduction to the language syntax by breaking down a couple of snippets and understanding how each bit worked. For instance, this is one of those snippets:

1
2
3
4
5
6
;;word count
(reduce #(if (%1 %2)
              (assoc %1 %2 (inc (%1 %2)))
              (assoc %1 %2 1))
{}
(clojure.string/split "Clojure 101 - this is is gonna be be great great great" #"\s"))

The cool thing here is that a simple example like this can show quite a few things about Clojure’s syntax:

After this brief discussion about Clojure’s API, we split up in pairs to solve a simple problem from Project Euler:

Add all the natural numbers below one thousand that are multiples of 3 or 5.

It’s the easiest on the site so it allowed us to focus entirely on the language. My first thought was to implement it in an imperative way, but that wouldn’t really teach me anything new so Kurman - the colleague I was pairing with - and I came up with this solution, which I really like:

1
2
3
4
(reduce #(if (or (= (rem %2 3) 0) (= (rem %2 5) 0)) (+ %1 %2)
           %1)
      0
      (take 1000 (iterate inc 0)))

A simple, concise solution that demanded a functional approach. It allowed us to explore Clojure’s APIs and concepts such as lazyness through the use of an infinite sequence. Fun :)

I created a github repository to host this and upcoming solutions from our group. Feel free to browse around.

Can’t wait for our next meetup, when we’ll hack on Overtone!

Hiring Good People Is Really Simple

It really is. There’s no magic involved.

Again and again I come across articles from people complaining about the current interview practices. They complain it doesn’t give you enough depth into the candidate’s ability, personality etc. That it measures the wrong qualities. Or that it just doesn’t work and bad people keep coming through the pipeline.

Well, instead of complaining how about you change it? I’m talking about hiring software engineers here so the first rule you should keep in mind is that you need to get them to write code.

Let me spell that out for you: get them to write code.

It’s as simple as that. If your hiring process doesn’t include this step, just give up and go do something else.

What kind of company are you hiring for?

This is important because it will let you know how to structure your interview process and where to place the “write some code” step. For instance, some companies such as Google, Facebook and ThoughtWorks can’t afford bringing in every candidate for on-site interviews. Not with that influx of applications anyway. Such companies put some sort of coding challenge as their first barrier to entry, before you’re ever invited into their offices. So keep that in mind.

What should I get them to code?

I’m glad you asked! But I’m afraid the answer is it depends.

Different companies - which in many cases means different projects, business models, cultures and what not - hire inherently different people. You need different skills.

Here is where I think ThoughtWorks gets things right most of the time. The coding challenge the candidate is given is, to a smaller extent, the exact kind of code he’s most likely to be writing once he joins. By no means does it mean he’s expected to write the same projects over and over. Far from it. It just gives us a fair go on their abilities in a close-to-real setting.

Google, on the other hand, will give a programming challenge over the phone. Pardon the pun but ‘google’ it and you’ll see it often involves puzzles which leads me to assume they solve puzzles everyday - don’t get me wrong, they helped solve many computing challenges of our time that I’m thankful for but they have a multitude of projects that need different skill sets and as such I’m not sure it’s the best process but, hey, they seem to have the best engineers in the world so it sure works wonders for them.

The bottom line is that this coding challenge should be geared towards finding what your company values, and what it considers to be good code. If puzzles work for you, do that. If you value well tested code, state so in the challenge and evaluate if you’re presented with a well written test suite. If you want someone who can learn anything fast, ask them to develop a simple app in a language they don’t know. You catch my drift.

What then?

The coding challenge is all well and good and it’s an extremely useful tool. However what happens next seems to vary a lot from company to company so I’ll share here, from personal experience, what I think really works: have candidates code whatever project you’re hiring them for.

Back when I moved from Brazil to Spain in 2008, this is exactly how I got hired.
I was interviewed over the phone by one of the company’s owner friends. He’s a genius. He had worked at this same company before but moved on to Google later on. That was my first barrier. Once past that, the company simply invited me, with all expenses paid, to spend a week in their offices, working on a real project together with my soon to be co-workers.

This was by far the most thorough interview process I’d ever been through. They actually got to see how I work. How I deal with other people. How I approach real problems. And on the other hand, I also had a great chance to evaluate what I was getting myself into.
After that week was up, the company’s CEO gathered the feedback from all my peers and to my delight made me an offer.

I still think this is the only way you can be sure to be hiring the right people. Is it time consuming? Yes. Is it expensive? Maybe. You can save a lot of money by only paying the wrong person to work for a week, than actually hiring them and having to pay for at least a month’s worth of work from them. Bad work at that!
I suppose such a process wouldn’t work for some companies, mainly Google and Facebook given that even after the first few barriers there are still a lot of applicants left but then again most companies out there aren’t like them.

But make no mistake - finding the right person is hard and it takes time. Knowing if someone is that right person, on the other hand, is simple, if you’re committed to it.

Conclusion

Obviously this isn’t the only way to hire good software engineers. But I do believe it’s the only one that is fair to both the company and the candidate and will definitely decrease, if not eliminate, your false positives.

Oh, and I apologize if I sound pedantic but getting them to write code, by whatever means, isn’t optional.

A Tale of Concurrency: Partitioning Data Between Processes

So I had just finished writing this feature. I was confident it’d work - after all I had a good level of tests around it. It sounded like yet another successful deployment.

One week in and something starts breaking. It was hard to track down but at the end I realised it was caused by having concurrent processes running in parallel.

That’s what happened in a recent production release at our current client. It was a very interesting problem to track down and solve and I’ll do my best to explain and walk you through it here.

What we were trying to achieve

I’ll try not to go into many details here as to how the feature works but here is the gist of it:

A user from the staff can select from a range of criteria to create a “List” that is used to match users against the database. These criteria are serialized to the database in this list object to allow for further processing.

When required, the staff can choose to mail the users resulting from that list. To get the set of users, the system de-serializes the criteria contained in the list, builds a SQL query and runs it against the database.

Constraints

These lists are contained in something called a “Push”. That’s just a concept that identifies that a set of lists and emails belong to a specific theme, say, ‘Kitty lovers’.

It’s a powerful concept though because its single most important rule is that a user should never receive more than one email within the same “Push”.

For example, take a look at the image below, which represents a Push in progress. Let U be the universe of all users in the database. L are users returned by a List within a given Push. R is the set of users that have already received an email from that Push.

Based on the rules I explained above, the green area then represents users we can still send emails to.

Content testing

With all the basic concepts in place, let’s pretend we are actually creating a Push for the ‘Kitty Lovers’. I am organizing a kitty expo at my house and I want to invite everyone that lives within 50km of Sydney, Australia - see the list specification going on here?

Let’s say this list gives us 25000 users. My house’s better be huge.

Simple enough, but I actually have 2 emails to send. They have different content, say 2 different pictures, and before I invite everyone, I want to test which email works best by sending them both to a subset of the list - make it a thousand random users each. After all, I want my event to be a huge success.

After I decide which email is best - the system tracks opens, clicks, etc… - I want to send it to the rest of the list, which is to say every user that has not received an email from this Push yet.

Offloading work, the problem begins

Due to certain architecture and performance requirements the unit of code that runs the list’s criteria against the database, gets the user ids and sends the email, is sent to a job queue to be later executed by a background process.

In development and staging we only ever had a single background process running so all was well. The code was basically executed sequentially.

In production however, we can - and usually have - several background workers picking up jobs from the queue. And that’s when the bug happened:

As you can see in the above image, given you have 2 jobs running in parallel, Job#1 selects the users from the list, starts sending some emails and in the meantime Job#2 comes along, selects roughly the same users and starts doing the same before Job#1 has had enough time to mark those users as having received the email already. Bam! Some users will receive duplicates!

The solution

I spent a lot of time literally staring at my laptop’s screen thinking about how to solve this issue. Locking the table? Nah… that would render having multiple workers useless.

What I really needed was a way to partition the data between jobs so as to avoid different jobs from dealing with the same set of users.

My next thought was to run the query once beforehand to sort the list of user ids, split it in groups equal to the number of jobs and work out the id ranges each job should deal with.

Although this solution would work, it would add the extra step of running the query once before actually running the jobs. That did not make me happy.

That’s when I started thinking about a metaphor that helped me come up with this insight: a Hash Table. If you ever implemented one you can see that the buckets in a hash table could be related to the jobs we have running.

I basically wanted different users to go to different buckets. Think hash functions. Think modulus.

To follow this approach I changed my code so that each job would be assigned an id, starting at 0, up to (number_of_jobs - 1). Then, when putting the jobs in the queue I would provide 2 extra arguments: the current job id and the total number of jobs.

Upon execution, the job would use these extra arguments to modify the query, adding an extra field to the select clause:

1
    SELECT ..., MOD(users.id, number_of_jobs) AS modulus...

And it would also append an extra filter to the where clause:

1
    WHERE ... AND modulus = current_job_id

That way, users would be spread as evenly as possible between jobs and no single user would be processed by more than one job.

For example: If we had 2 jobs, ids 0 and 1 and a list of 10 users with ids from 0 to 10, this would be the users distribution between jobs:

1
2
3
4
5
6
7
8
9
10
11
  user_id:0 - processed by job_id:0 (given by 0 mod 2)
  user_id:1 - processed by job_id:1 (given by 1 mod 2)
  user_id:2 - processed by job_id:0 (given by 2 mod 2)
  user_id:3 - processed by job_id:1 (given by 3 mod 2)
  user_id:4 - processed by job_id:0 (given by 4 mod 2)
  user_id:5 - processed by job_id:1 (given by 5 mod 2)
  user_id:6 - processed by job_id:0 (given by 6 mod 2)
  user_id:7 - processed by job_id:1 (given by 7 mod 2)
  user_id:8 - processed by job_id:0 (given by 8 mod 2)
  user_id:9 - processed by job_id:1 (given by 9 mod 2)
  user_id:10 - processed by job_id:0 (given by 10 mod 2)

A simple, elegant solution for a tricky problem. One that I’m finally happy about.

If you made it to here, I appreciate your patience and hope you enjoyed the read ;)

A Few Useful Git Commands

I’ve been working fulltime with Git for a while now and I’ve noticed a pattern of commands that emerge every so often, but not often enough that would make me remember them.

That’s why I decided to post them here so I know where to look in the future. And of course they can turn out to be useful to someone else too.

So without further ado, here they are:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#undo last commit
git reset HEAD^

#show files in a given commit
git show --pretty="format:" --name-only rev_number

#remove untracked files and directories
git clean -f -d

#track remote branch
git branch --track branch_name origin/master

# given you created a new local branch 'branch_name'
# pushes 'branch_name' to 'origin/branch_name', creating the remote branch for you
git push origin branch_name

#delete remote branch
git push origin :remote_branch_name

I’ll keep this list up to date and add more useful stuff as need arises. Enjoy.

git

Good Bye Wordpress, Hello Nanoc

I just finished migrating my blog from wordpress to nanoc using a customized version of nanoc3_blog and it feels nice.
I grew tired of keeping up with Wordpress’ updates and this migration also means that I can finally get rid of php on my vps.

However, do let me know if there is anything weird around here. The whole customization done on nanoc3_blog was related to the wordpress migration.


I wanted to make sure all url’s stayed the same so as to avoid broken links into the blog. Feel free to browse the repository and see if the changes can be helpful to you.
It also includes a modified version of a wordpress import script that was responsible for keeping the links correct, among other things.

I’ll try and document the changes I made so people can easily cherry pick what they want.

One Year of ThoughtWorks: A Retrospective

It seems like it was yesterday that I wrote about my next big challenge: Moving to Australia to join ThoughtWorks. Since then a year has passed and I thought I’d share a little bit about what it’s been like so far.

The first thing you notice when joining ThoughtWorks is that you’re surrounded by smart, opinionated people. It was once common for me to be the one fighting for automated testing, TDD, Ruby and overall finding new and better ways of doing what we do - but imagine how different it is to join a team that takes that for granted. This gives you a very high standard within the team that frees your mind to try and achieve bigger things. It’s a great feeling.

ThoughtWorks‘ culture is inspiring. We’re a small and very active community - there’s always something going on in our offices after hours: Tech nights, local groups’ meetups, training, people randomly hacking on stuff and even band rehearsals! The point is that you can get involved in several ways and the best part is that ThoughtWorks supports all that: be it providing pizzas and beers, flying us to international conferences, buying books and whatnot in order to help us do our best! I could go on but you get the point.

In return, we get to work on hard stuff. Not only technically challenging but personally challenging as well. As consultants we can be thought of as agents of change: When we go to a client, we’re not only concerned with building the coolest projects with the best tools out there - of course that’s a big part of what we do - but we’re expected to change the way they think, work and do business.

Since I joined I worked with 2 clients - and 2 different projects - that varied wildly between them. That is in part what makes working at ThoughtWorks so interesting to me. You get to change domains, technologies and teams more often than not, which nurtures a stimulating environment.

The first project for instance was a standard Java web app where we had the usual players such as Spring, Hibernate, Freemarker, etc…

My current project, on the other hand, is very much like a start-up: We have a EC2 box that plays our CI role, all code is hosted on Github, we’re using Ruby on Rails and deploying the whole thing to Heroku + RDS.

To top it off I also had the chance to make a real difference in the Queensland Flood Relief Appeal in 2010, when we built a donation app to help the flood victims - that was definitely the highlight of the year, and something I’m particularly proud of.

What next? I’m eager to find out what my second year here has to offer. Bring it on!