Flexible Antlr Generation
17 April 2007
I've been exploring various alternative languages and grammars for external DSLs. One of my main tools for this is Antlr. With this kind of exploration I have a project with multiple similar grammar files where I want to run essentially the same thing with different grammars. Although I only have a few grammar files at the moment, I could well end up with a couple of dozen.
Using these in my build is currently rather awkward. Up to now, I've had explicit calls to Antlr to build each grammar file. The file gets done whether or not it's changed recently, which slows the whole build down. What I'd like is a way to automatically figure out where the grammar files are to build, and build them if necessary.
I keep the grammar files in directories like src/parser1/Catalog.g,
src/parser2/Catalog.g
and I want to generate them to gen/parser1,
gen/parser2
. That way I can keep the generated
gen
directory out of source control (as it should
be). Some directories have a regular grammar file (always called
Catalog.g) only, others also have a tree walker grammar (called
CatalogWalker.g) if I do tree building and walking.
It may be possible to get ant to do this, but my ant is rusty and frankly I'm happy to keep it that way. My usual build process these days is to use Rake, but it has an issue here - calling Antlr multiple times would lead to multiple JVM invocations which can be slow due to the start-up time of the JVM. After toying with some alternatives I thought that it would be worth giving JRuby a spin.
Ruby makes it easy to find and select out the directories that match my naming conventions
Dir['src/parser*']. select{|f| f =~ %r[src/parser\d+]}. collect{|f| Antlr.new(f)}. each {|g| g.run}
The regular expressions used for File globs (as in
src/parser*
isn't quite enough for my naming
convention, so I have to filter the results with a more precise
regexp. Once I have my real directories I create a command object to
process them.
As I was working on this, I decided that I wanted to be able to run the script both with regular ruby (calling Antlr via the command line) and JRuby (calling the Antlr command facade directly). That way I could run the script on machines that didn't have JRuby installed. Doing so is pretty easy, I just have to keep the JRuby bits isolated.
The Antlr
class does all the figuring out of what
needs to be done and delegates to an internal engine to actually
call Antlr in the two different styles. I initialize the object with
the directory to process, and it figures out the right target
directory and whether it needs to generate a walker.
class Antlr... def initialize dir @dir = dir @grammarFile = File.join @dir, 'Catalog.g' raise “No Grammar file in #{dir}” unless File.exists? @grammarFile walker_name = File.join @dir, 'CatalogWalker.g' @walker = File.exists?(walker_name) ? walker_name : nil @dest = @dir.sub %r[src/], 'gen/' end
When I run the object it checks to see if it needs to run before invoking the engine.
class Antlr... def run return if current? puts "%s => %s " % [@grammarFile, @dest] mkdir_p @dest run_tool self end def current? return false unless File.exists? @dest output = File.join(@dest,'CatalogParser.java') sources = [@grammarFile] sources << @walker if @walker return uptodate?(output, sources) end
The run_tool
method takes the data out of fields and
puts it onto command line arguments for Antlr (I'll call the facade
with a string array of arguments too.)
class Antlr... def run_tool args = [] args << '-o' << @dest args << "-lib" << @dest if @walker args << @grammarFile args << @walker if @walker @@engine.run_tool args end
For the engine I have two implementations. The simplest just makes a command line call.
class AntlrCommandLine def run_tool args classpath = Dir['lib/*.jar'].join(File::PATH_SEPARATOR) system “java -cp #{classpath} org.antlr.Tool #{args.join ' '}” end end
The JRuby version is a bit more involved as it has to import the Antlr facade file and sort out classpaths.
class AntlrJruby def initialize require 'java' Dir['lib/*.jar'].each{|j| require j} include_class 'org.antlr.Tool' end def run_tool args Tool.new(args.to_java(:string)).process end end
With all the time I've spent tearing my hair out with classpaths
I just love the fact that I can just require a jar at runtime
here. Especially since the code Dir['lib/*.jar'].each{|j|
require j}
loads all the jars in a directory - which is
something that java makes horribly hard.
The last trick is ensuring that the right engine is used for the job. I do this with some inline code inside the Antlr command class.
class Antlr... tool_class = (RUBY_PLATFORM =~ /java/) ? AntlrJruby : AntlrCommandLine @@engine = tool_class.new
Pretty simple and sweet that it runs in regular ruby or JRuby.
But there's a punch line and joke's on me. I set all this up to use JRuby because I was afraid that the start up time of the JVMs would make running it from C ruby too slow. But the the C ruby actually does a clean build faster than the JRuby version. Maybe this will change once I get more grammar files to build, but for the moment it looks like I've fallen victim to premature optimization. (And it's not worthwhile for me to figure out why, both builds are fast enough for now.)