26 March 2007

In recent weeks I've been playing with, and looking at, compiler-compiler tools. A common feature of these tools is that they have a grammar file whose core is a description of the production rules of a grammar for a language. As well as describing the grammar, the file also provides information to the parser about how to process the language as it recognizes the language elements. In most compiler-compiler tools these instructions are represented as actions in the grammar - often these actions are encoded as as fragments of code in a high level language.

For example in my HelloAntlr example you see bits of embedded Java to create and populate a configuration from the source file. (Embedding Java isn't the only approach, tree walking is another.)

This approach of embedding a General Purpose Language (GPL) inside another Domain Specific Language (DSL) is quite common. Most readers here will have come across it when creating HTML pages using templating systems like Velocity, JSP, ERBs and the like. Again we have a different representation (HTML) where we can embed fragments of a GPL to provide dynamic data and more complex processing.

When I'm working in an environment like this, I like to minimize the amount of Java (or whatever GPL I'm using) in my templates. A common technique for this is to create a separate helper class in Java and ensure that all the embedded Java in the template does is make simple method calls to this helper.

The main reason I like to do this is because I believe that if you embed large amounts of a GPL in a DSL, you end up obscuring the flow of the DSL. The whole point of using a template language for HTML is to concentrate on the HTML, so every bit of Java you stick in there gets in the way. This is especially true for grammar files where lots of code in actions makes it hard to understand the productions.

A further benefit of using an embedment helper is that it makes it easier for tools to do their job. Whether it's just syntax highlighting, or the full power of a PostIntelliJ IDE, these tools often don't work well with mixed language files. AntlrWorks, for example, will highlight and offer completion on Antlr's grammar, but embedded Java is just plain text.

When using a helper like this, my normal style is to include code early on in the host (DSL) file to set up the helper. Usually this involves declaring a field in the host and either constructing a new helper in there, or making it so a caller can pass a helper in. (I confess I'm happy to use a public field in my Antlr grammar for this.) After that all the embedded Java in the host is a simple call on the helper. I name these calls from the perspective of the host file, to indicate what's wanted from the helper.

The helper and the host files are very tightly coupled together, usually with a bi-directional link between them and plenty of back and forth. The helper knows all sorts of grubby details about the host - I'm happy for an HTML helper to spit out HTML and grammar helpers will poke around the parse tree.

Usually I treat the word "helper" on a class as a red flag as it usually indicates a poorly thought out abstraction. Here I'm happy to use the word, since the helper is really only there as a support to the host file.