martinfowler.com logo Home Blog Articles Books About Me Contact Me ThoughtWorks

Dsl bliki


DomainSpecificLanguage, DslBoundary, DslReadings, EmbedmentHelper, ExpressionBuilder, FlexibleAntlrGeneration, FluentInterface, HelloRacc, InstallingOpenArchitectureWare, InternalDslStyle, LanguageWorkbenchReadings, MetaProgrammingSystem, RubyAnnotations


InstallingOpenArchitectureWare dsl 27 July 2007 Reactions

Update: the procedure and complaints here are no longer valid. Open ArchitectureWare has released a new version with Eclipse 3.3 that looks like it will install much more easily than what I just went through. There's also know a packaged distribution that includes eclipse and all the OAW stuff.

There are few things more frustrating than spending hours trying to install a piece of software and then having to delete everything and start again. Today at 9.30 I began installing openArchitectureWare, I finally had it installed (I think) at 15.30. So I thought I'd write this to help someone else do it more quickly.

OpenArchitectureWare is a set of tools, based on Eclipse, to support Model Driven Development. I'm interested in exploring some of its tools that are oriented towards DomainSpecificLanguages. (Xtext - which helps you develop textual languages - is something that's specifically been pointed out to me as worth looking at.) I don't know how worthwhile these tools are yet, after all it took me most of the day just to install the dratted thing, but we'll see.

One of my problems with the installation was that I'm not an Eclipse user - my usual Java IDE is IntelliJ. To install openArchitectureWare you need to know how to deal with the plugin system in Eclipse - and I'd never done anything with Eclipse before so that was new to me.

The first step was the easiest one - install Eclipse. I installed it on my Ubuntu machine, so all I had to do was wajig install eclipse (wajig is a unified command-line for various debian packaging and sysadmin tools). Then all hell broke loose. Rather than go through my miserable morning, I'll explain what I would do now.

The trouble with OpenArchitectureWare is that it has dependencies, other eclipse plugins that need to be installed before it can work. As anyone with experience in these things knows, sorting out dependencies can be a right pain without a good tool. apt-get for Debian and gem for ruby are examples of a good tool that resolves dependencies. When I installed eclipse, apt-get knew it had to pull down a whole host of dependencies and installed them for me. The situation in Eclipse is not so good.

To install openArchitectureWare you need a bunch of plugins: EMF, UML2, ATL, and GMF. I couldn't see from the web pages exactly how to get these things, or if they had their own dependencies.

There are several ways of installing plugins in Eclipse, although I had to hunt a bit for instructions. The easiest way is a menu option in Eclipse itself. In the menus pick [Help -> Software Updates -> Find and Install] (no I don't know why it's on the help menu). With a bit of button pushing you can get it to download a list of packages - the relevant source is the Callisto Discovery Site. Once you have that list downloaded look in the Models and Model Development section and select Eclipse Modeling Framework (EMF) and Graphical Modeling Framework (GMF). You'll get an error message saying that these have an unresolved dependency. Take note of the button on the right that says 'select required'. Hit it and it will find the dependency to GEF and its dependency on Batik. If you don't see that button and hit it you'll have a frustrating time trying to find them (believe me, I know).

That gets two of openArchitectureWare's dependencies. The others, and openArchitectureWare itself need to be done the harder way. Digging around the eclipse site I found the relevant web pages for UML2 and ATL. These need to be downloaded as zip files as does openArchitectureWare itself.

When you unzip the UML2 and openArchitectureWare folders they unzip into a folder called eclipse that contains subfolders for plugins and features. You can take the contents of these folders and put them into corresponding folders on your load environment (in my case /usr/local/lib/eclipse). As that didn't work for me when I tried it first, I found another way.

The way to tell if stuff has installed properly is to go to [Help -> Software Updates -> Manage Configuration]. When you open that you have the option of "Add an Extension Location". An extension location is (almost) any directory that contains an eclipse folder with subfolder for plugins and features. I say almost because the eclipse folder also needs a file called .eclipseextension. This is just an empty file so you can create it with touch .eclipseextension. What I did is created folders in /usr/local/lib for openArchitectureWare and uml2-eclipse, moved the unzipped eclipse folders in there, did touch .eclipseextension inside each of them and then added them using "Add an Extension Location". ATL just produces a plugin directory so I copied the contents of it into the plugin directory for openArchitectureWare.

It's important that you do this after you use the Find and Install tool because if you do it first, the Find and Install tool will tell you have an unresolved dependency and refuse to do anything until you fix it. When I was all installed it tells me "UML2 End-User Features (2.1.1.v200707181556) requires plug-in "org.eclipse.emf.ecore.xmi (2.3.0)". I don't know how to fix this and I have a bunch of emf.ecore jars present in EMF. However the rest of eclipse seems to work so far, so I'm carrying on regardless.


DslReadings dsl 13 July 2007 Reactions

(See my note on DomainSpecificLanguage for a quick intro to this topic and my terminology on it.)

Update:David Laribee has written a post contrasting what he calls ordered and unordered fluent interfaces. The distinction is that ordered fluent interfaces force a particular flow on how you compose your DSL sentence. He provides an example where he uses multiple interfaces on a single ExpressionBuilder - the same technique that's used by JMock.

Anders Norås has written two interesting articles on writing internal DSLs in C#. The first article gives a sample of the DSL and a discussion against Chromatic's cynical check-list. The second article goes into details about its implementation.

Piers Cawley makes the point that a key characteristic of DSLs is their narrow focus on a domain.


HelloRacc dsl 30 May 2007 Reactions

When I said HelloCup I was looking at a yacc based parser in a language that didn't require me to handle my dirty pointers. Another alternative to play with is Ruby which now has a yaccish parser built in to the standard library - inevitably called racc.

Racc has an interesting interplay between ruby and grammar syntax. You define the grammar with a racc file which will generate a parser class.

Again I'll do my simple hello world case. The input text is

item camera
item laser

I'll populate item objects inside a catalog, using the following model classes.

class Item
  attr_reader :name
  def initialize name
    @name = name
  end
end

class Catalog 
  extend Forwardable
  def initialize
    @items = []
  end
  def_delegators :@items, :size, :<<, :[] 
end

Forwardable is a handy library that allows me to delegate methods to an instance variable. In this case I delegate a bunch of methods to the @items list.

I test what I read with this.

class Tester < Test::Unit::TestCase
  def testReadTwo
    parser = ItemParser.new
    parser.parse "item camera\nitem laser\n"
    assert_equal 2, parser.result.size
    assert_equal 'camera', parser.result[0].name
    assert_equal 'laser', parser.result[1].name
  end
  def testReadBad
    parser = ItemParser.new
    parser.parse "xitem camera"
    fail
    rescue #expected
  end   
end

To build the file and run the tests I use a simple rake file.

# rakefile...
task :default => :test

file 'item.tab.rb' => 'item.y.rb' do
  sh 'racc item.y.rb'
end

task :test => 'item.tab.rb' do 
  require 'rake/runtest'
  Rake.run_tests 'test.rb'
end

The racc command needs to be installed on your system. I did it the easy way on Ubuntu with apt-get. It takes the input file and creates one named inputFileName.tab.rb.

The parser grammar class is a special format, but one that's pretty familiar to yaccish people. For this simple example it looks like this:

#file item.y.rb...
class ItemParser
  token 'item'  WORD
  rule
    catalog: item | item catalog;
    item: 'item' WORD {@result << Item.new(val[1])};
end

The tokens clause declares the token's we get from the lexer. I use the string 'item' and WORD as a symbol. The rule clause starts the production rules which are in the usual BNF form for yacc. As you might expect I can write actions inside curlies. To refer to the elements of the rule I use the val array, so val[1] is the equivalent to $2 in yacc (ruby uses 0 based array indexes, but I've forgiven it). Should I wish to return a value from the rule (equivalent to yacc's $$) I assign it to the variable result.

The most complicated part of using racc is to sort out the lexer. Racc expects to call a method that yields tokens, where each token is a two-element array with the first element being the type of token (matching the token declaration) and the second element the value (what shows up in val - usually the text). You mark the end of the token stream with [false, false]. The sample code with racc uses regular expression matching on a string. A better choice for most cases is to use StringScanner, which is in the standard ruby library.

I can use this scanner to convert a string into an array of tokens.

#file item.y.rb....
---- inner
def make_tokens str
  require 'strscan'
  result = []
  scanner = StringScanner.new str
  until scanner.empty?
    case
      when scanner.scan(/\s+/)
        #ignore whitespace
      when match = scanner.scan(/item/)
        result << ['item', nil]
      when match = scanner.scan(/\w+/)
        result << [:WORD, match]
      else
        raise "can't recognize  <#{scanner.peek(5)}>"
    end
  end
  result << [false, false]
  return result
end

To integrate the scanner into the parser, racc allows you to place code into the generated parser class. You do this by adding code to the grammar file. The declaration ---- inner marks the code to go inside the generated class (you can also put code at the head and foot of the generated file). I'm calling a parse method in my test, so I need to implement that.

#file item.y.rb....
---- inner
attr_accessor :result

def parse(str)
  @result = Catalog.new
  @tokens = make_tokens str
  do_parse
end

The do_parse method initiates the generated parser. This will call next_token to get at the next token, so we need to implement that method and include it in the inner section.

#file item.y.rb....
---- inner
def next_token
  @tokens.shift
end

This is enough to make racc work with the file. However as I play with it I find the scanner more messy than I would like. I really just want it to tell the lexer what patterns to match and what to return with them. Something like this.

#file item.y.rb....
---- inner
def make_lexer aString
  result = Lexer.new
  result.ignore /\s+/
  result.keyword 'item'
  result.token /\w+/, :WORD
  result.start aString
  return result
end

To make this work I write my own lexer wrapper over the base functionality provided by StringScanner. Here's the code to set up the lexer and and handle the above configuration.

class Lexer...
  require 'strscan'
  def initialize 
    @rules = []
  end
  def ignore pattern
    @rules << [pattern, :SKIP]
  end
  def token pattern, token
    @rules << [pattern, token]
  end
  def keyword aString
    @rules << [Regexp.new(aString), aString]
  end
  def start aString
    @base = StringScanner.new aString
  end

To perform the scan I need to use StringScanner to compare the rules against the input stream.

class Lexer...
  def next_token
    return [false, false] if @base.empty?
    t = get_token
    return (:SKIP == t[0]) ? next_token : t
  end
  def get_token
    @rules.each do |key, value|
      m = @base.scan(key)
      return [value, m] if m
    end 
    raise  "unexpected characters  <#{@base.peek(5)}>"
  end  

I can then alter the code in the parser to call this lexer instead.

#file item.y.rb....
---- inner
def parse(arg)
  @result = Catalog.new
  @lexer = make_lexer arg
  do_parse
end

def next_token
  @lexer.next_token
end

As well as giving me a better way to define the rules, this also allows the grammar to control the lexer because it's only grabbing one token at a time - this would give me a mechanism to implement lexical states later on.

On the whole racc is pretty easy to set up and use - providing you know yacc. The documentation is on the minimal side of sketchy. There's a simple manual on the website and some sample code. There's also a very helpful presentation on racc. I also got a few tips from our Mingle team who've used it for a nifty customization language inside Mingle.


FlexibleAntlrGeneration dsl 17 April 2007 Reactions

I've been exploring various alternative languages and grammars for external DSLs. One of my main tools for this is Antlr. With this kind of exploration I have a project with multiple similar grammar files where I want to run essentially the same thing with different grammars. Although I only have a few grammar files at the moment, I could well end up with a couple of dozen.

Using these in my build is currently rather awkward. Up to now, I've had explicit calls to Antlr to build each grammar file. The file gets done whether or not it's changed recently, which slows the whole build down. What I'd like is a way to automatically figure out where the grammar files are to build, and build them if necessary.

I keep the grammar files in directories like src/parser1/Catalog.g, src/parser2/Catalog.g and I want to generate them to gen/parser1, gen/parser2. That way I can keep the generated gen directory out of source control (as it should be). Some directories have a regular grammar file (always called Catalog.g) only, others also have a tree walker grammar (called CatalogWalker.g) if I do tree building and walking.

It may be possible to get ant to do this, but my ant is rusty and frankly I'm happy to keep it that way. My usual build process these days is to use Rake, but it has an issue here - calling Antlr multiple times would lead to multiple JVM invocations which can be slow due to the start-up time of the JVM. After toying with some alternatives I thought that it would be worth giving JRuby a spin.

Ruby makes it easy to find and select out the directories that match my naming conventions

Dir['src/parser*'].
  select{|f| f =~ %r[src/parser\d+]}.
  collect{|f| Antlr.new(f)}.
  each {|g| g.run}

The regular expressions used for File globs (as in src/parser* isn't quite enough for my naming convention, so I have to filter the results with a more precise regexp. Once I have my real directories I create a command object to process them.

As I was working on this, I decided that I wanted to be able to run the script both with regular ruby (calling Antlr via the command line) and JRuby (calling the Antlr command facade directly). That way I could run the script on machines that didn't have JRuby installed. Doing so is pretty easy, I just have to keep the JRuby bits isolated.

The Antlr class does all the figuring out of what needs to be done and delegates to an internal engine to actually call Antlr in the two different styles. I initialize the object with the directory to process, and it figures out the right target directory and whether it needs to generate a walker.

class Antlr...
  def initialize dir
    @dir = dir
    @grammarFile = File.join @dir, 'Catalog.g'
    raise "No Grammar file in #{dir}" unless File.exists? @grammarFile
    walker_name = File.join @dir, 'CatalogWalker.g'
    @walker = File.exists?(walker_name) ? walker_name : nil
    @dest = @dir.sub %r[src/], 'gen/'
  end

When I run the object it checks to see if it needs to run before invoking the engine.

class Antlr...
  def run
    return if current?
    puts "%s => %s " % [@grammarFile, @dest]
    mkdir_p @dest 
    run_tool    
    self
  end
  def current?
    return false unless File.exists? @dest
    output = File.join(@dest,'CatalogParser.java')
    sources = [@grammarFile]
    sources << @walker if @walker
    return uptodate?(output, sources)
  end

The run_tool method takes the data out of fields and puts it onto command line arguments for Antlr (I'll call the facade with a string array of arguments too.)

class Antlr...
  def run_tool
    args = []
    args << '-o' << @dest 
    args << "-lib" << @dest if @walker
    args << @grammarFile
    args << @walker if @walker
    @@engine.run_tool args
  end

For the engine I have two implementations. The simplest just makes a command line call.

class AntlrCommandLine
  def run_tool args
    classpath = Dir['lib/*.jar'].join(File::PATH_SEPARATOR)
    system "java -cp #{classpath} org.antlr.Tool #{args.join ' '}"
  end
end

The JRuby version is a bit more involved as it has to import the Antlr facade file and sort out classpaths.

class AntlrJruby
  def initialize 
    require 'java'
    Dir['lib/*.jar'].each{|j| require j}
    include_class 'org.antlr.Tool'
  end
  def run_tool args
    Tool.new(args.to_java(:string)).process
  end
end

With all the time I've spent tearing my hair out with classpaths I just love the fact that I can just require a jar at runtime here. Especially since the code Dir['lib/*.jar'].each{|j| require j} loads all the jars in a directory - which is something that java makes horribly hard.

The last trick is ensuring that the right engine is used for the job. I do this with some inline code inside the Antlr command class.

class Antlr...
  tool_class = (RUBY_PLATFORM =~ /java/) ? AntlrJruby : AntlrCommandLine
  @@engine = tool_class.new

Pretty simple and sweet that it runs in regular ruby or JRuby.

But there's a punch line and joke's on me. I set all this up to use JRuby because I was afraid that the start up time of the JVMs would make running it from C ruby too slow. But the the C ruby actually does a clean build faster than the JRuby version. Maybe this will change once I get more grammar files to build, but for the moment it looks like I've fallen victim to premature optimization. (And it's not worthwhile for me to figure out why, both builds are fast enough for now.)


EmbedmentHelper dsl 26 March 2007 Reactions

In recent weeks I've been playing with, and looking at, compiler-compiler tools. A common feature of these tools is that they have a grammar file whose core is a description of the production rules of a grammar for a language. As well as describing the grammar, the file also provides information to the parser about how to process the language as it recognizes the language elements. In most compiler-compiler tools these instructions are represented as actions in the grammar - often these actions are encoded as as fragments of code in a high level language.

For example in my HelloAntlr example you see bits of embedded Java to create and populate a configuration from the source file. (Embedding Java isn't the only approach, tree walking is another.)

This approach of embedding a General Purpose Language (GPL) inside another Domain Specific Language (DSL) is quite common. Most readers here will have come across it when creating HTML pages using templating systems like Velocity, JSP, ERBs and the like. Again we have a different representation (HTML) where we can embed fragments of a GPL to provide dynamic data and more complex processing.

When I'm working in an environment like this, I like to minimize the amount of Java (or whatever GPL I'm using) in my templates. A common technique for this is to create a separate helper class in Java and ensure that all the embedded Java in the template does is make simple method calls to this helper.

The main reason I like to do this is because I believe that if you embed large amounts of a GPL in a DSL, you end up obscuring the flow of the DSL. The whole point of using a template language for HTML is to concentrate on the HTML, so every bit of Java you stick in there gets in the way. This is especially true for grammar files where lots of code in actions makes it hard to understand the productions.

A further benefit of using an embedment helper is that it makes it easier for tools to do their job. Whether it's just syntax highlighting, or the full power of a PostIntelliJ IDE, these tools often don't work well with mixed language files. AntlrWorks, for example, will highlight and offer completion on Antlr's grammar, but embedded Java is just plain text.

When using a helper like this, my normal style is to include code early on in the host (DSL) file to set up the helper. Usually this involves declaring a field in the host and either constructing a new helper in there, or making it so a caller can pass a helper in. (I confess I'm happy to use a public field in my Antlr grammar for this.) After that all the embedded Java in the host is a simple call on the helper. I name these calls from the perspective of the host file, to indicate what's wanted from the helper.

The helper and the host files are very tightly coupled together, usually with a bi-directional link between them and plenty of back and forth. The helper knows all sorts of grubby details about the host - I'm happy for an HTML helper to spit out HTML and grammar helpers will poke around the parse tree.

Usually I treat the word "helper" on a class as a red flag as it usually indicates a poorly thought out abstraction. Here I'm happy to use the word, since the helper is really only there as a support to the host file.


ExpressionBuilder dsl 4 January 2007 Reactions

One of the problems with a FluentInterface is that it results in some odd looking methods. Consider this example:

customer.newOrder()
  .with(6, "TAL")
  .with(5, "HPK").skippable()
  .with(3, "LGV")
  .priorityRush();

Methods like with, skippable, and priorityRush don't sit well on the Order class. The naming works well in the context of the little DomainSpecificLanguage that the fluent interface provides, but we usually expect methods to make sense in their own right. The methods violate the CommandQuerySeparation which in Java means that methods that change the ObservableState of an object shouldn't have a return value. If we supply methods that make more individual sense, like addLine, we also go against the notion of a MinimalInterface.

At the heart of all this is a mismatch between what a fluent interface needs and our usual guidelines for API design. What works well for a regular API doesn't work for a fluent one and vice versa.

An Expression Builder is a solution to this problem. An expression builder is a separate object on which we define the fluent interface that then translates the fluent calls to the underlying regular API calls. So an expression builder for the order case would look something like this.

public class OrderBuilder {
  private Order subject = new Order();
  private OrderLine currentLine;

  public OrderBuilder with(int quantity, String productCode) {
    currentLine = new OrderLine(quantity, Product.find(productCode));
    subject.addLine(currentLine);
    return this;
  }

  public OrderBuilder skippable() {
    currentLine.setSkippable(true);
    return this;
  }

  public OrderBuilder priorityRush() {
    subject.setRush(true);
    return this;
  }

  public Order getSubject() {
    return subject;
  }
}

In this case I have a single expression builder class, but you can also have a small structure of builders, something like a customer builder, order builder, and line builder. Using a single object means you need a variable to keep track of what line you are working on for the skippable method. Using a structure can avoid this, but is a bit more complicated and you need to ensure lower level builders can handle methods that are intended for higher order builders. In this case an OrderLineBuilder would need to have delegating methods for all the methods of the OrderBuilder.

For an interesting open example of expression builder, take a look at the JMock library. They use an interesting variant of this approach to handle their little DSL for expectations. There is a single expression builder object (InvocationMockerBuilder). As usual with a single builder object, every call to a builder method ends with a return this to continue the method chaining. The interesting twist is that the return type varies, depending on which part of the expression we are in. The returning interface provides only the methods that make sense for that part of the expression. This supports better error checking, and also means that the method completion you find on IDEs works better by only suggesting methods that are legal at that point of the expression.

You can find out more about the design of the DSL handling in JMock, and how it has evolved from a regular API, by reading Steve and Nat's OOPSLA paper.


Links
home
bliki
feed 
Translations
Japanese
Spanish
Korean
Chinese
Thai
Categories
agile
design
dsl
leisure
refactoring
ruby
thoughtWorks
tools
uml
writing
Blog Roll
ThoughtBlogs
TW Alumni
Nicholas Carr
Steve Cook
Brian Foote
Simon Harris
Gregor Hohpe
/\ndy Hunt
Ralph Johnson
Patrick Logan
David Ing
Brian Marick
Jeremy Miller
Jimmy Nilsson
Samuel Pepys
Keith Ray
Johanna Rothman
Kathy Sierra
Dave Thomas

© Copyright Martin Fowler, all rights reserved