google io thoughtworks on gae

June 22, 2009

I've just watched a video from Google IO where Martin Fowler and Rebecca Parsons went through some of the aspects that involves the development of an application for the cloud - focusing on the JVM.

In terms of the Google App Engine, you don't have access to a relational database, thing I found out when I first tried it. Instead you get a Big Table.

Martin put out a good analogy and you can just think of it as a nested hash map. It's certainly a shift on how we think these days, but layers of abstraction like google's DataStore and the Java Persistence API will help in the transition.

Another interesting bit about the presentation was on how concurrency works on GAE.

Essentially, in an standard Java application you have a single memory space where you have at least one running thread. You can create threads on the fly, which will share the same memory space, thus making it easy to share data.

On the app engine, things work differently. What you have are separate memory spaces with a single thread on each one. Any attempt to create a new thread will result in an exception. The solution for sharing information in this case? Use the nested hash map (big table).

Now, whereas you might not be worried about this since your application doesn't span any threads, as well pointed by Martin Fowler, it's the code you don't see that you need to be careful with. Any Java application uses a number of 3rd party libraries that might span out threads of their own, which will result in your application blowing up.

That rang a bell. Again, back when I was trying the app engine, one of the configuration bits shared by Ola Bini looked like this:

   config.webxml.jruby.min.runtimes = 1
   config.webxml.jruby.max.runtimes = 1
   config.webxml.jruby.init.serial = true

I think the properties are pretty much self-explanatory but I didn't quite understand the reason for setting it back then.

If you happen to have bigger values for the number of runtimes you want, you need to set the serial property to true, otherwise JRuby will span several threads to create the runtimes.

This is a really good example of things that might fail whether you're migrating or developing a new app to deploy on the App Engine. Luckily for us, JRuby has a smart and neat way to handle this - the configuration I've just shown, but most of the libraries out there that might rely on threads are not prepared.

Martin and Rebecca's opinion on this is that new releases of these same libraries will start to take it into account, since a bigger adoption of the Cloud seem to be on the way.

Make sure you watch the video. I certainly left a lot of interesting stuff out.

Java
JRuby