Using the Rake Build Language

Rake is a build language, similar in purpose to make and ant. Like make and ant it's a Domain Specific Language, unlike those two it's an internal DSL programmed in the Ruby language. In this article I introduce rake and describe some interesting things that came out of my use of rake to build this web site: dependency models, synthesized tasks, custom build routines and debugging the build script.

29 December 2014

I've been using Ruby extensively now for many years. I like its terse but powerful syntax, and its generally well-written libraries. A couple of years ago I converted much of my web site generation from XSLT to Ruby and have been utterly happy with that change.

If you are a regular reader of mine, you'll not be surprised to know that my entire web site is built automatically. I originally used ant - the build environment popular in the Java world - to do this as that fitted well with Java XSL processors. As I've been using Ruby more I've made more use of Rake, a build language based on Ruby developed by Jim Weirich. Recently I completely replaced the build process removing all the ant in favor of Rake.

In my early days with Rake, I used it in a similar way to how I'd used ant. In this push, however, I tried to do things differently to explore some of the interesting features of Rake. As a result I thought I'd write this article to delve into some of these areas. Rake is my third build language. I used make many years ago (and have forgotten much of it). I've used ant quite a lot in the last six years or so. Rake has many features that these languages have, and a few more that are (to me) new twists. Although Rake is written in Ruby and heavily uses the language, you can use it for any automated build processing. For simple build scripts you don't need to know Ruby, but as soon as things start getting more interesting then you need to know Ruby to get Rake to do its best things.

This is a somewhat skewed story. I'm not trying to write a tutorial on Rake - I'm going to concentrate on things I find interesting rather than give a complete coverage. I'm not going to assume you know Ruby, Rake, or indeed any other build language. I'll explain relevant bits of Ruby as I go along. Hopefully if you've done any messing with these, or are just interested in different computational models, you'll find this worth a diverting read.

Dependency Based Programming

Hang on - in the preceding paragraph I said “different computational models”. Isn't that rather a grand phrase for a build language? Well no, it isn't. All the build languages I've used (make, ant (Nant), and rake) use a dependency based style of computation rather than the usual imperative style. That leads us to think differently about how to program them. It doesn't strike most people that way as most build scripts are pretty short, but it's actually quite a profound difference.

It's time for an example. Let's imagine we want to write a program to build a project. We have several different steps to this build.

CodeGen: take data configuration files and use them to generate the database structure and the code to access the database.
Compile: compile the application code.
DataLoad: load test data into the database.
Test: run the tests.

We need to be able to run any of these tasks independently and ensure everything works. We can't test until we do all the previous steps. Compile and DataLoad need CodeGen run first. How do we express these rules?

If we do it in imperative style, it looks like this making each task a ruby procedure.

# this is comment in ruby
def codeGen  #def introduces a procedure or method
  # do code gen stuff
end

def compile
  codeGen
  # do compile stuff
end

def dataLoad
  codeGen
  # do data load stuff
end

def test
  compile
  dataLoad
  #run tests
end

Notice there's a problem with this. If I call test I execute the codeGen step twice. This won't cause an error because the codeGen step is (I assume) idempotent - that is calling it multiple times is no different to calling it once. But it will take time, and builds are rarely things that have time to spare.

To fix this I could separate the steps into public and internal parts like this

def compile
  codeGen
  doCompile
end

def doCompile
  # do the compile
end

def dataLoad
  codeGen
  doDataLoad
end

def doDataLoad
  #do the data load stuff
end

def test
  codeGen
  doCompile
  doDataLoad
  #run the tests
end

This works, but it's a little messy. It's also a perfect example of how a dependency based system can help. With an imperative model, each routine calls the steps in the routine. In a dependency based system we have tasks and specify pre-requisites (their dependencies). When you call a task, it looks to see what pre-requisites there are and then arranges to call each pre-requisite task once. So our simple example would look like this.

task :codeGen do
  # do the code generation
end

task :compile => :codeGen do
  #do the compilation
end

task :dataLoad => :codeGen do
  # load the test data
end

task :test => [:compile, :dataLoad] do
  # run the tests
end

(Hopefully you can get a sense of what this says, I'll explain the syntax properly in a moment.)

Now if I call compile, the system looks at the compile task and sees it's dependent upon the codeGen task. It then looks at the codeGen task and sees no pre-requisites. As a result it runs codeGen followed by compile. This is the same as the imperative situation.

The interesting case, of course, is the test. Here the system sees that both compile and dataLoad are dependent on codeGen so it arranges the tasks so codeGen runs first, followed by compile and dataload (in any order) and finally test. Essentially the actual order of the tasks run is figured out at run time by the execution engine, not decided at design time by the programmer who writes the build script.

This dependency based computational model fits a build process really well, which is why all three use it. It's natural to think of a build in terms of tasks and dependencies, most steps in a build are idempotent, and we really don't want unnecessary work to slow down the build. I suspect that few people that knock up a build script realize they are programming in a funky computational model, but that's what it is.

Domain Specific Language for Builds

All my three build languages share another characteristic - they are all examples of a Domain Specific Language (DSL). However they are different kinds of DSL. In the terminology I've used before:

make is an external DSL using a custom syntax
ant (and nant) is an external DSL using an XML based syntax
rake is an internal DSL using Ruby.

The fact that rake is an internal DSL for a general purpose language is a very important difference between it and the other two. It essentially allows me to use the full power of ruby any time I need it, at the cost of having to do a few odd looking things to ensure the rake scripts are valid ruby. Since ruby is a unobtrusive language, there's not much in the way of syntactic oddities. Furthermore since ruby is a full blown language, I don't need to drop out of the DSL to do interesting things - which has been a regular frustration using make and ant. Indeed I've come to view that a build language is really ideally suited to an internal DSL because you do need that full language power just often enough to make it worthwhile - and you don't get many non-programmers writing build scripts.

Rake Tasks

Rake defines two kinds of task. Regular tasks are similar to tasks in ant, and file tasks are similar to tasks in make. If either of those mean nothing to you, don't worry, I'm about to explain.

Regular tasks are the simplest to explain. Here's one from one of my build scripts for my testing environment.

task :build_refact => [:clean] do
  target = SITE_DIR + 'refact/'
  mkdir_p target
  require 'refactoringHome'
  OutputCapturer.new.run {run_refactoring}
end

The first line defines much of the task. In this language task is effectively a keyword that introduces a task definition. :build_refact is the name of the task. The syntax for naming it is a little funky in that we need to start it with a colon, one of the consequences of being an internal DSL.

After the name of the task we then move onto the pre-requisites. Here there's just the one,:clean. The syntax is => [:clean]. We can list multiple dependencies inside the square brackets separated by commas. From the much earlier examples you can see that we don't need the square brackets if there's only one task. We don't need the dependencies at all if there aren't any (or indeed for other reasons - there's a fascinating topic in there that I'll get to later).

To define the body of the task we write ruby code within do and end. Inside this block we can put any valid ruby we like - I won't bother explaining this code here, as you don't need to understand it to see how tasks work.

The nice thing about a rake script (or rakefile as rubyists call it) is you can read this pretty clearly as a build script. If we were to write the equivalent in ant it would look something like this:

<target name = “build_refact” depends = “clean”>
<-- define the task -->
</target>

Now you can look at this as a DSL and follow it, but since it's an internal DSL you may be interested in how this works as valid ruby. In reality task isn't a keyword, it's a routine call. It takes two arguments.

The first argument is a hash (the equivalent of a map or dictionary). Ruby has a special syntax for hashes. In general the syntax is {key1 => value1, key2 => value2}. However the curly brackets are optional if there's only one hash, so you don't need them while defining the rake task, which helps to simplify the DSL. So what are the key and the value? The key here is a symbol - identified in ruby by the leading colon. You can use other literals, we'll see strings shortly, and you can use variables and constants too - which we'll discover to be rather handy. The value is an array - which is really the equivalent of a list in other languages. Here we list the names of the other tasks. If we don't use the square brackets we just have one value instead of a list - rake copes with an array or a single literal - very accommodating of it, I must say.

So where's the second argument? It's what lies between do and end - a block - ruby's word for a Closure. So as the rake file runs it builds up an object graph of these task objects, connected to each other through the dependency links and each having a block to execute when the right time comes. Once all the tasks are created the rake system can then use the dependency links to figure out which tasks need to be run in what order and then it does that, calling the blocks for each task in the appropriate order. A key property of closures is that they don't need to be executed when they are evaluated, they can be saved for later - even if they refer to variables that aren't in scope when the block is actually executed.

The thing here is that what we are seeing is legal ruby code, admittedly arranged in a very odd way. But this odd way allows us to have to have a pretty readable DSL. Ruby also helps by having a very minimal syntax - even little things like not needing parentheses for procedure arguments helps this DSL stay compact. Closures are also vital - as they often are in writing internal DSLs, because they allow us to package code in alternative control structures.

File Tasks

The tasks I talked about above are similar to tasks in ant. Rake also supports a slightly different kind of task called a file task which is closer to the notion of tasks in make. Here's another example, slightly simplified, from my web site rakefile.

file 'build/dev/rake.html' => 'dev/rake.xml' do |t|
  require 'paper'
  maker = PaperMaker.new t.prerequisites[0], t.name
  maker.run
end

With a file you are referring to actual files rather than task names. So 'build/dev/rake.html' and 'dev/rake.xml' are actual files. The html file is the output of this task and the xml file is the input. You can think of a file task as telling the build system how to make the output file - indeed this is exactly the notion in make - you list the output files you want and tell make how to make them.

An important part of the file task is that it's not run unless you need to run it. The build system looks at the files and only runs the task if the output file does not exist or its modification date is earlier than the input file. File tasks therefore work extremely well when you're thinking of things at a file by file basis.

One thing that's different with this task is that we pass the task object itself as a parameter into the closure - that's what the |t| is doing. We can now refer to the task object in the closure and call methods on it. I do this to avoid duplicating the names of the files. I can get the name of the task (which is the output file) with t.name. Similarly I can get the list of prerequisites with t.prerequisites.

Ant has no equivalent to file tasks, instead each task does the same kind of necessity checking itself. The XSLT transform task takes an input file, style file and output file and only runs the transform if the output file doesn't exist or is older than any of the input files. This is just a question of where to place the responsibility of this checking - either in the build system or in the tasks. Ant mostly uses canned tasks written in java, make and rake both rely on the build writer to write code for the task. So it makes more sense to relieve the writer of the task of the need to check if things are up to date.

However it's actually pretty easy to do the up to date checks in the rake tasks. This is what it would look like.

task :rakeArticle do
  src = 'dev/rake.xml'
  target = 'build/dev/rake.html'
  unless uptodate?(target, [src]) 
    require 'paper'
    maker = PaperMaker.new src, target 
    maker.run
  end
end

Rake provides (via the fileutils package) a bevy of simple unix-like commands for file operations such as cp, mv, rm, etc. It also provides uptodate? which is perfect for these kinds of checks.

So here we see two ways of doing things. We can either use file tasks or regular tasks with uptodate? in order to decide whether to do things - which should we choose?

I must admit I don't have a good answer to this question. Both tactics seem to work pretty well. What I decided to do with my new rakefile was to push fine-grained file tasks as far as I could. I didn't do this because I knew it was the best thing to do, I did it mainly to see how it would turn out. Often when you come across something new it can be a good idea to overuse it in order to find out its boundaries. This is a quite reasonable learning strategy. It's also why people always tend to overuse new technologies or techniques in the early days. People often criticize this but it's a natural part of learning. If you don't push something beyond its boundary of usefulness how do you find where that boundary is? The important thing is to do so in a relatively controlled environment so you can fix things when you find the boundary. (After all, until we tried it I thought XML would be a good syntax for build files.)

I'll also say now that so far I've not found any problems with pushing file tasks and fine-grained tasks too far. I may think otherwise in a year or two's time, but so far I'm happy.

Defining Dependencies Backwards

So far I've mostly talked about how rake does similar things to what you find in ant and make. That's a nice combination - combine both capabilities with the full power of ruby on tap - but that alone wouldn't give me too much of a reason for this little article. The thing that got my interest was some particular things that rake does (and allows) that are a bit different. The first of these is allowing to specify dependencies in multiple places.

In ant you define dependencies by stating them as part of the dependent task. I've done this with my rake examples so far as well, like this.

task :second => :first do
  #second's body
end

task :first do
  #first's body
end

Rake (like make) allows you to add dependencies to a task after you've initially declared it. Indeed it allows you to continue to talk about a task in multiple places. This way I can decide to add dependencies close to the pre-requisite task, like this.

task :second do
  #second's body
end

task :first do
  #first's body
end
task :second => :first

This doesn't make much difference when the tasks are right next to each other in the build file, but in longer build files it does add a new bit of flexibility. Essentially it allows you to think about dependencies either in the usual way, or add them when you add the pre-requisite tasks, or indeed put them in a third location independent of both.

As usual this flexibility gives new questions, where is it best to define dependencies? I don't have a certain answer yet, but in my build file I used two rules of thumb.When I was thinking about one task that needed to be done before I could do another, I defined the dependency when I was writing the dependent task, in the conventional way. However I often used dependencies to group together related tasks, such as the various errata pages. When using dependencies for grouping (a common part of structuring build files) it seemed to make sense to put the dependency by the pre-requisite task.

task :main => [:errata, :articles]

#many lines of build code

file 'build/eaaErrata.html' => 'eaaErrata.xml' do
  # build logic
end
task :errata => 'build/eaaErrata.html'

I don't actually have to define the :errata task with a task keyword, just putting it as a dependency for :main is enough to define the task. I can then add individual errata files later on and add each to the group task. For this kind of group behavior this seems a reasonable way to go (although I don't actually do it quite like this in my build file, as we'll see later.)

One question that this raises is 'how do we find all the dependencies when they are spread out all over the build file?' It's a good question but the answer is to get rake to tell you, which you can do with rake -P, which prints out every task with its pre-requisites.

Synthesizing Tasks

Allowing you to add dependencies after you've defined a task, together with having full ruby available to you, introduces some further tricks to the build.

Before I explain about synthesized tasks, however, I need to introduce some important principles about build processes. Build scripts tend to have to do two kinds of build - clean builds and incremental builds. A clean build occurs when your output area is empty, in this case you build everything from its (version controlled) sources. This is the most important thing the build file can do and the number one priority is to have a correct clean build.

Clean builds are important, but they do take time. So often it's useful to do incremental builds. Here you already have stuff in your output directories. An incremental build needs to figure out how to get your output directories up to date with the latest sources with the minimal amount of work. There are two errors that can occur here. First (and most serious) is a missing rebuild - meaning that some items that should have got built didn't. That's very bad because it results in output that doesn't really match the input (in particular the result of a clean build on the input). The lesser error is an unnecessary rebuild - this builds an output element that didn't need to be built. This is a less serious error as it's not a correctness error, but it is a problem because it adds time to the incremental build. As well as time it adds confusion - when I run my rake script I expect to see only things that have changed get built, otherwise I wonder “why did that change?”

Much of the point of arranging a good dependency structure is to ensure that incremental builds work well. I want to do an incremental build of my site just by going 'rake' - invoking the default task. I want that to build only what I want to.

So that's my need, an interesting problem is to get that to work for my bliki. The sources for my bliki is a whole bunch of xml files in my bliki directory. The output is one output file for each entry, plus several summary pages - of which the main bliki page is the most important. What I need is for any change to a source file to re-trigger the bliki build.

I could do this by naming all the files like this.

BLIKI = build('bliki/index.html')

file BLIKI => ['bliki/SoftwareDevelopmentAttitude.xml',
               'bliki/SpecificationByExample.xml',
               #etc etc
              ] do
  #logic to build the bliki
end

def build relative_path
 # allows me to avoid duplicating the build location in the build file
 return File.join('build', relative_path)
end

But clearly this would be dreadfully tedious, and just asking for me to forget to add a new file to the list when I want to add one. Fortunately I can do it this way.

BLIKI = build('bliki/index.html')

FileList['bliki/*.xml'].each do |src|
  file BLIKI => src
end

file BLIKI do 
  #code to build the bliki
end

FileList is part of rake, it will generate lists of files based on the glob that's passed in - here it creates a list of all the files in the bliki source directory. The each method is an internal iterator that allows me to loop through them and add each one as a dependent to the file task. (The each method is a collection closure method.)

One other thing I do with the bliki task is add a symbolic task for it.

desc “build the bliki”
task :bliki => BLIKI

I do this so I can just build the bliki alone with rake bliki. I'm not sure I really need this any more. If all the dependencies are set up properly (as they are now) I can just do a default rake and there's no unnecessary rebuild. But I've kept it in for the moment. The desc method allows you to define a short description to the following task, this way when I run rake -T I get a list of any tasks with a desc defined for them. This is a useful way of seeing what targets are available to me.

If you've used make before, you may be thinking that this is reminiscent of one of make's greatest features - the ability to specify pattern rules to automatically make certain kinds of file. The common example is that you want build any foo.o file by running the C compiler on the corresponding foo.c file.

%.o : %.c
        gcc $< -o $@

The %.c will match every file that ends with '.c'. $< refers to the source (pre-requisite) and $@ to the target of the rule. This pattern rule means that you don't have to list every file in your project with the compile rule, instead the pattern rule tells make how to build any *.o file it needs. (And indeed you don't even need this in your make file as make comes packaged with many pattern rules like this.)

Rake actually has a similar mechanism. I'm not going to talk about it, other than to mention it exists, because I haven't yet found I needed it. Synthesizing tasks worked for all I needed.

Block Scoping Tasks

One of the problems I found with using file names and dependencies is that you have to repeat the file names. Take this example.

file 'build/articles/mocksArentStubs.html' => 'articles/mock/mocksArentStubs.xml' do |t|
 transform t.prerequisites[0], t.name
end
task :articles => 'build/articles/mocksArentStubs.html'

In the above example 'build/articles/mocksArentStubs.html' is mentioned twice in the code. I can avoid repeating in the action block by using the task object, but I have to repeat it to set up the dependency to the overall articles task. I don't like that repetition because it's asking for trouble if I change my file name. I need a way to define it once. I could just declare a constant, but then I'm declaring a constant that's visible everywhere in my rakefile when I'm only using it in this section. I like variable scopes to be as small as possible.

I can deal with this by using the FileList class that I mentioned above, but this time I use it with only a single file.

FileList['articles/mock/mocksArentStubs.xml'].each do |src|
  target = File.join(BUILD_DIR + 'articles', 'mocksArentStubs.html')
  file target => src do
    transform src, target
  end
  task :articles => target
end

This way I define src and target variables that are only scoped within this block of code. Notice that this only helps me if I define the dependency from the :articles task here. If I want to define the dependency in the definition of the :articles task, I would need a constant so I get the visibility across the whole rakefile.

When Jim Weirich read a draft of this he pointed out that if you find the FileList statement too wordy, you can easily define a method specifically to do this:

  def with(value)
    yield(value)
  end

and then do

  with('articles/mock/mocksArentStubs.xml') do |src|
    # whatever
  end

Build Methods

One of the really great things about having a build language be an internal DSL to a full programming language is that I can write routines to handle common cases. Sub-routines are one of the most elementary ways of structuring a program, and the lack of convenient sub-routine mechanisms is one of the great problems of ant and make - particularly as you get more complex builds.

Here's an example of such a common build routine I used - this is to use an XSLT processor to convert an XML file into HTML. All my newer writing uses ruby to do this translation, but I have a lot of older XSLT stuff around and I don't see any rush to change it. After writing various tasks to process XSLT I soon saw that there was some duplication, so I defined a routine for the job.

def xslTask src, relativeTargetDir, taskSymbol, style
  targetDir = build(relativeTargetDir)
  target = File.join(targetDir, File.basename(src, '.xml') + '.html')
  task taskSymbol => target
  file target => [src] do |t|
    mkdir_p targetDir
    XmlTool.new.transform(t.prerequisites[0], t.name, style)
  end
end

The first two lines figure out the target directory and the target file. Then I add the target file as a dependent to the supplied task symbol. Then I create a new file task with instructions to create the target directory (if needed) and use my XmlTool to carry out the XSLT transform. Now when I want to create an XSLT task I just call this method.

xslTask 'eaaErrata.xml', '.', :errata, 'eaaErrata.xsl'

This method nicely encapsulates all the common code and parametrizes the variables for my needs at the moment. I found it really helpful to pass in the parent group task into the routine so that the routine would easily build the dependency for me - another advantage of the flexible way of specifying dependencies. I have a similar common task for copying files directly from source to the build directories, which I use for images, pdfs etc.

def copyTask srcGlob, targetDirSuffix, taskSymbol
  targetDir = File.join BUILD_DIR, targetDirSuffix
  mkdir_p targetDir
  FileList[srcGlob].each do |f|
    target = File.join targetDir, File.basename(f)
    file target => [f] do |t|
      cp f, target
    end
    task taskSymbol => target
  end
end

The copyTask is a bit more sophisticated because it allows me to specify a glob of files to copy, this allows me to copy stuff like this:

copyTask 'articles/*.gif', 'articles', :articles

This copies all gif files in the articles sub-directory of my sources into the articles directory of my build directory. It makes a separate file task for each one and makes them all dependents of the :articles task.

Platform Dependent XML Processing

When I used ant to build my site, I used java based XSLT processors. As I started to use rake I decided to switch to native XSLT processors. I use both Windows and Unix (Debian and MacOS) systems, both of which have XSLT processors easily available. Of course they are different processors and I need to invoke them differently - but of course I want this to be hidden to the rakefile and certainly to me when I invoke rake.

Here again is the nice thing about having a full blown language to work with directly. I can easily write an Xml processor that uses platform information to do the right thing.

I start with the interface part of my tool - the XmlTool class.

class XmlTool
  def self.new
    return XmlToolWindows.new if windows?
    return XmlToolUnix.new
  end
  def self.windows?
    return RUBY_PLATFORM =~ /win32/i 
  end
end

In ruby you create an object by calling the new method on the class. The great thing about this, as opposed to tyrannical constructors, is that you can override this new method - even to the point of returning an object of a different class. So in this case when I invoke XmlTool.new I don't get an instance of XmlTool - instead I get the right kind of tool for whatever platform I'm running the script on.

The simplest of the two tools is the Unix version.

class XmlToolUnix
  def transform infile, outfile, stylefile
    cmd = “xsltproc #{stylefile} #{infile} > #{outfile}”
    puts 'xsl: ' + infile
    system cmd
  end
  def validate filename
    result = `xmllint -noout -valid #{filename}`
    puts result unless  '' == result
  end
end

You'll notice I have two methods here for XML, one for XSLT transform and one for XML validation. For unix each one invokes a command line call. If you're unfamiliar with ruby notice the nice ability to insert a variable into a string with the #{variable_name} construct. Indeed you can insert the result of any ruby expression in there - which is really handy. In the validate method I use back-quotes - which execute the command line and return the result. The puts command is ruby's way of printing to standard output.

The windows version is a bit more complex as it needs to use COM rather than the command line.

class XmlToolWindows
  def initialize
    require 'win32ole'
  end
  def transform infile, outfile, stylefile
    #got idea from http://blog.crispen.org/archives/2003/10/24/lessons-in-xslt/
    input = make_dom infile
    style = make_dom stylefile
    result = input.transformNode style
    raise "empty html output for #{infile}" if result.empty?
    File.open(outfile, 'w') {|out| out << result}
  end
  def make_dom filename, validate = false
    result = WIN32OLE.new 'Microsoft.XMLDOM'
    result.async = false
    result.validateOnParse = validate
    result.load filename
    return result
  end
  def validate filename
    dom = make_dom filename, true
    error = dom.parseError
    unless error.errorCode == 0
      puts "INVALID: code #{error.errorCode} for  #{filename} " + 
        "(line #{error.line})\n#{error.reason}"
    end
  end
end

The statement require 'win32ole' pulls in ruby library code for working with windows COM. Notice that this is a regular part of the program; in ruby you can set things up so that libraries are only loaded if needed and present. I can then manipulate the COM objects just as with any other scripting language.

You'll notice there's no type relationship between these three XML processing classes. The xml manipulations work because both the windows and unix XmlTools implement the transform and validate methods. This is what rubyists refer to as duck typing - if it walks like a duck and quacks like a duck then it must be a duck. There's no compile time checking to ensure these methods are present. If a method is incorrect it will fail at run time - which should be flushed out by testing. I won't bother going into the whole dynamic vs static type checking debate, just point out that this is an example of the use of duck typing.

If you are using a unix system, you may need to use whatever package management system you have to find and download the unix xml commands I'm using (on the Mac I used Fink). The XMLDOM DLL usually comes with windows, but again depending on your setup you may need to download it.

Going Pear-Shaped

The one thing you can guarantee about programming is that stuff always goes wrong. However much you try there's always some mismatch between what you think you said and what the computer hears. Take a look at this bit of rake code (simplified from something that actually did happen to me).

src = 'foo.xml'
target = build('foo.html')
task :default => target
copyTask 'foo.css', '.', target
file target => src do
  transform src, target
end

See the bug? Neither did I. What I did know is that the transformation that builds build/foo.html was always happening even when it didn't need to - an unnecessary rebuild. I couldn't figure out why. The timestamps on both files were correct even if I made damn sure the target was later than the source I'd still get a rebuild.

My first line of investigation was to use rake's trace capability (rake --trace). Usually it's all I need to identify strange invocations, but this time it didn't help at all. It just told me that the 'build/foo.html' task was being executed - but it didn't say why.

At this point one might be inclined to blame Jim for the lack of debug tools. Perhaps cursing might at least make me feel better: “your mother is a she wolf from Cleveland and your father is piece of wet carrot”.

But I have a better alternative. Rake is ruby and tasks are just objects. I can get a reference to these objects and interrogate them. Jim may not have put this debug code into rake, but I can just as easily add it myself.

class Task 
  def investigation
    result = "------------------------------\n"
    result << "Investigating #{name}\n" 
    result << "class: #{self.class}\n"
    result <<  "task needed: #{needed?}\n"
    result <<  "timestamp: #{timestamp}\n"
    result << "pre-requisites: \n"
    prereqs = @prerequisites.collect {|name| Task[name]}
    prereqs.sort! {|a,b| a.timestamp <=> b.timestamp}
    prereqs.each do |p|
      result << "--#{p.name} (#{p.timestamp})\n"
    end
    latest_prereq = @prerequisites.collect{|n| Task[n].timestamp}.max
    result <<  "latest-prerequisite time: #{latest_prereq}\n"
    result << "................................\n\n"
    return result
  end
end

Here's some code to see what all this should be. If you're not a rubyist you may find it odd to see I've actually added a method to the task class that's part of rake. This kind of thing, the same thing as an aspect-oriented introduction, is quite legal in ruby. Like many ruby things you can imagine chaos with this feature, but as long as you are careful it's really nice.

Now I can invoke it to see more about what's going on

src = 'foo.xml'
target = build('foo.html')
task :default => target
copyTask 'foo.css', '.', target
file target => src do |t|
  puts t.investigation
  transform src, target
end

I get this printed out:

------------------------------
Investigating build/foo.html
class: Task
task needed: true
timestamp: Sat Jul 30 16:23:33 EDT 2005
pre-requisites:
--foo.xml (Sat Jul 30 15:35:59 EDT 2005)
--build/./foo.css (Sat Jul 30 16:23:33 EDT 2005)
latest-prerequisite time: Sat Jul 30 16:23:33 EDT 2005
................................

At first I wondered about the timestamp. The timestamp on the output file was 16:42, so why did the task say 16:23? Then I realized the class of the task was Task not FileTask. Task's don't do date checking, if you invoke them they will always run. So I tried this.

src = 'foo.xml'
target = build('foo.html')
file target
task :default => target
copyTask 'foo.css', '.', target
file target => src do |t|
  puts t.investigation 
  transform src, target
end

The change is that I declared the task as a file task before I mention it in the context of other tasks later. That did the trick.

The lesson from this is that with this kind of internal DSL you have the ability to interrogate the object structure to figure out what's going on. This can be really handy when weird stuff like this happens. I used this approach in another case where I had unnecessary builds - it was really useful to pop the hood and see exactly what was happening.

(By the way my investigation method breaks if the output file doesn't exist yet, such as in a clean build. I haven't spent any effort to fix it because I only needed it when the file was already there.)

Since I wrote this Jim added an investigation method, very close to this one, to rake itself. So you no longer need to do what I did here. But the general principle still holds - if rake doesn't do something you want, you can go in and modify its behavior.

Using Rake to build non-ruby applications

Although rake is written in ruby, there's no reason why you can't use it to build applications written in other languages. Any build language is a scripting language for building stuff, and you can happily build one environment using tools written in another. (A good example was when we used ant to build a Microsoft COM project, we just had to hide it from the Microsoft consultant.) The only thing with rake is that it's useful to know ruby in order to do more advanced things, but I've always felt that any professional programmer needs to know at least one scripting language to get all sort of odd-jobs done.

Running Tests

Rake's library allows you to run tests directly within the rake system with the TestTask class

require 'rake/testtask'
Rake::TestTask.new do |t|
  t.libs << "lib/test"
  t.test_files = FileList['lib/test/*Tester.rb']
  t.verbose = false
  t.warning = true
end

By default this will create a :test task which will run the tests in the given files. You can use multiple task objects to create test suites for different circumstances.

By default the test task will run all the tests in all the given files. If you want to run just the tests in a single file, than you can do that with

        rake test TEST=path/to/tester.rb

If you want to run a single test called “test_something”, you need to use TESTOPTS to pass in options to the test runner.

         rake test TEST=path/to/tester.rb TESTOPTS=--name=test_something

I often find it helpful to create temporary rake tasks for running specific tests. To run one file, I can use:

Rake::TestTask.new do |t|
  t.test_files = FileList['./testTag.rb']
  t.verbose = true
  t.warning = true
  t.name = 'one'
end

To run one test method I add in the test options:

Rake::TestTask.new do |t|
  t.test_files = FileList['./testTag.rb']
  t.verbose = true
  t.warning = true
  t.name = 'one'
  t.options = "--name=test_should_rebuild_if_not_up_to_date"
end

File Path Manipulations

Rake extends the string class to do some useful file manipulation expressions. For example if you want to specify a target file by taking the source and changing the file extension you can do so like this

“/projects/worldDominationSecrets.xml”.ext(“html”)
# => '/projects/worldDominationSecrets.html'

For more complicated manipulations there is a pathmap method that uses template markers in a similar style to printf. For example the template “%x” refers to the file extension of a path and “%X” refers to everything but the file extension, so I could write the above example like this.

“/projects/worldDominationSecrets.xml”.pathmap(“%X.html”)
# => '/projects/worldDominationSecrets.html'

Another common case is having things from 'src' turn up in 'bin'. To do this we can do substitution on elements in templates by using “%{pattern,replacement}X”, for example

“src/org/onestepback/proj/foo.java”.pathmap(“%{^src,bin}X.class”)
# => “bin/org/onestepback/proj/foo.class”

You can find the full list of path manipulation methods in Rake's String.pathmap documentation.

I find these methods so useful that I like to use them whenever I'm doing file path manipulations in my own code. To make them available you need:

require 'rake/ext/string'

Namespaces

As you build up a larger build script, it's easy to end up with lots of tasks with similar names. Rake has a concept of namespaces which helps you organize these. You create a namespace with

    namespace :articles do
      # put tasks inside the namespace here eg
      task :foo
    end

You can then invoke the namespaced task with rake articles:foo

If you need to refer to tasks outside the namespace you are currently in, then you use a fully qualified name for the task - which is usually easier using a string form of task name.

    namespace :other do
       task :special => 'articles:foo'
    end

Built in Cleaning

A common need with builds is to clean up the files you've generated. Rake provides a built in way to make this work. Rake has two levels of cleaning: clean and clobber. Clean is the gentlest approach, it removes all intermediate files, it doesn't remove the final product, only temporary files that are used to derive the final product. Clobber uses stronger soap and removes all generated files, including the final products. Essentially clobber restores you to only the files that are checked into source control.

There is some terminological confusion here. I often hear people using “clean” to mean removing all generated files, equivalent to rake's clobber. So be wary of that confusion.

To use the built in cleaning, you need to import rake's built in cleaning with require 'rake/clean'. This introduces two tasks: clean and clobber. As they stand, however, the tasks don't know which files to clean. To tell it you use a pair of file lists: CLEAN and CLOBBER. You can then add items to the file lists with expressions like CLEAN.include('*.o'). Remember that the clean task removes everything in the clean list, and clobber removes everything in both clean and clobber lists.

Odds and Ends

By default, rake does not print out the stack trace if you get an error in the code that rake calls. You can get the stack trace by running with the --trace flag, but usually I'd just rather see it anyway. You can do that by putting Rake.application.options.trace = true into the rakefile.

Similarly I find the file manipulation outputs from FileUtils to be distracting. You can turn them off from the command line with the -q option, but also disable them in your rakefile with the call verbose(false).

It's often useful to turn warnings onwhen you run rake, you can do this by manipulating $VERBOSE, there's some good notes on using $VERBOSE from Mislav Marohnić

To look up a rake task object in the rakefile itself, use Rake::Task[:aTask]. The task's name can be specified either as using a symbol or as a string. This allows you to invoke one task from another without using a dependency using Rake::Task[:aTask].invoke. You shouldn't need to do this often, but it's occasionally handy.

Final Thoughts

So far I've found rake to be a powerful and easy to use build language. Of course it helps that I'm comfortable in ruby, but rake has convinced me that a build system makes sense as an internal DSL to full-blown language. Scripts are a natural for building stuff in many ways, and rake adds just enough features to provide a really good build system on top of a fine language. We also have the advantage that ruby is an open source language that runs on all the platforms that I need.

I was surprised by the consequences of flexible dependency specification. It allowed me to do a number of things that reduced duplication - which I think will allow me to make it easier to maintain my build scripts in the future. I found several common functions that I pulled out into a separate file and shared between the build scripts for martinfowler.com and refactoring.com.

If you're automating builds you should take a look at rake. Remember that you can use it for any environment, not just ruby.

Acknowledgements

Thanks to Jason Yip. Juilian Simpson, Jon Tirsen, and Jim Weirich for comments on the draft of this article. Thanks to Dave Smith, Ori Peleg, and Dave Stanton for some corrections once it was published.

But the biggest thank you has to go to Jim Weirich for writing rake in the first place. My website thanks you.

Significant Revisions

29 December 2014: Updated discussion of running tests

10 August 2005: Initial Publication