Oslo

28 October 2008

Oslo is a project at Microsoft, of which various things have been heard but with little details until this week's PDC conference. What we have known is that it has something to do with ModelDrivenSoftwareDevelopment and DomainSpecificLanguages.

A couple of weeks ago I got an early peek behind the curtain as I, and my language-geek colleague Rebecca Parsons, went through a preview of the PDC coming-out talks with Don Box, Gio Della-Libera and Vijaye Raji. It was a very interesting presentation, enough to convince me that Oslo is a technology to watch. It's broadly a Language Workbench. I'm not going to attempt a comprehensive review of the tool here, but just my scattered impressions from the walk-through. It was certainly interesting enough that I thought I'd publish my impressions here. With the public release at the PDC I'm sure you'll be hearing a lot more about it in the coming weeks. As I describe my thoughts I'll use a lot of the language I've been developing for my book, so you may find the terminology a little dense.

Oslo has three main components:

  • a modeling language (currently code-named M) for textual DSLs
  • a design surface (named Quadrant) for graphical DSLs
  • a repository (without a name) that stores semantic models in a relational database.

(All of these names are current code names. The marketing department will still use the same smarts that replaced "Avalon and Indigo" with "WPF and WCF". I'm just hoping they'll rename "Windows" to "Windows Technology Foundation".)

The textual language environment is bootstrapped and provides three base languages:

  • MGrammar: defines grammars for Syntax Directed Translation.
  • MSchema: defines schemas for a Semantic Model
  • MGraph: is a textual language for representing the population of a Semantic Model. So while MSchema represents types, MGraph represents instances. Lispers might think of MGraph as s-expressions with a ugly syntax.

You can represent any model in MGraph, but the syntax is often not too good. With MGrammar you can define a grammar for your own DSL which allows you to write scripts in your own DSL and build a parser to translate them into something more useful.

Using the state machine example from my book introduction, you could define a state machine semantic model with MSchema. You could then populate it (in an ugly way) with MGraph. You can build a decent DSL to populate it using MGrammar to define the syntax and to drive a parser.

There is a grammar compiler (called mg) that will take an input file in MGrammar and compile it into what they call an image file, or .mgx file. This is different to most parser generator tools. Most parser generators tools take the grammar and generate code which has to be compiled into a parser. Instead Oslo's tools compile the grammar into a binary form of the parse rules. There's then a separate tool (mgx) that can take an input script and a compiled grammar and outputs the MGraph representation of the syntax tree of the input script.

More likely you can take the compiled grammar and add it to your own code as a resource. With this you can call a general parser mechanism that Oslo provides as a .NET framework, supply the reference to the compiled grammar file, and generate an in-memory syntax tree. You can then walk this syntax tree and use it to do whatever you will - the parsing strategy I refer to as Tree Construction.

The parser gives you a syntax tree, but that's often not the same as a semantic model. So usually you'll write code to walk the tree and populate a semantic model defined with MSchema. Once you've done this you can easily take that model and store it in the repository so that it can accessed via SQL tools. Their demo showed entering some data via a DSL and accessing corresponding tables in the repository, although we didn't go into complicated structures.

You can also manipulate the semantic model instance with Quadrant. You can define a graphical notation for a schema and then the system can project the model instance creating a diagram using that notation. You can also change the diagram which updates the model. They showed a demo of two graphical projections of a model, updating one updated the other using Observer Synchronization. In that way using Quadrant seems like a similar style of work to a graphical Language Workbench such MetaEdit.

As they've been developing Oslo they have been using it on other Microsoft projects to gain experience in its use. Main ones so far have been with ASP, Workflow, and web services.

More on M

We spent most of the time looking at the textual environment. They have a way of hooking up a compiled grammar to a text editing control to provide a syntax-aware text editor with various completion and highlighting goodness. Unlike tools such as MPS, however, it is still a text editor. As a result you can cut and paste stretches of text and manipulate text freely. The tool will give you squigglies if there's a problem parsing what you've done, but it preserves the editing text experience.

I think I like this. When I first came across it, I rather liked the MPS notion of: "it looks like text, but really it's a structured editor". But recently I've begun to think that we lose a lot that way, so the Oslo way of working is appealing.

Another nice text language tool they have is an editor to help write MGrammars. This is a window divided into three vertical panes. The center pane contains MGrammar code, the left pane contains some input text, and the right pane shows the MGraph representation of parsing the input text with the MGrammar. It's very example driven. (However it is transient, unlike tests.) The tool resembles the capability in Antlr to process sample text right away with a grammar. In the conversation Rebecca referred to this style as "anecdotal testing" which is a phrase I must remember to steal.

The parsing algorithm they use is a GLR parser. The grammar syntax is comparable to EBNF and has notation for Tree Construction expressions. They use their own varient of regex notation in the lexer to be more consistent with their other tools, which will probably throw people like me more used to ISO/Perl regexp notation. It's mostly similar, but different enough to be annoying.

One of the nice features of their grammar notation is that they have provided constructs to easily make parameterized rules - effectively allowing you to write rule subroutines. Rules can also be given attributes (aka annotations), in a similar way to .NET's language attributes. So you can make a whole language case insensitive by marking it with an attribute. (Interestingly they use "@" to mark an attribute, as in the Java syntax.)

The default way a grammar is run is to do tree construction. As it turns out the tree construction is the behavior of the default class that gets called by the grammar while it's processing some input. This class has an interface and you can write your own class that implements this. This would allow you to do embedded translation and embedded interpretation. It's not the same as code actions, as the action code isn't in the grammar, but in this other class. I reckon this could well be better since the code inside actions often swamp grammars.

They talked a bit about the ability to embed one language in another and switch the parsers over to handle this gracefully - heading into territory that's been explored by Converge. We didn't look at this deeply but that would be interesting.

An interesting tidbit they mentioned was that originally they intended to only have the tools for graphical languages. However they found that graphical languages just didn't work well for many problems - including defining schemas. So they developed the textual tools.

(Here's a thought for the marketing department. If you stick with the name "M" you could use this excellent film for marketing inspiration ;-))

Comparisons

Plainly this tool hovers in the same space as tools like Intentional Software and JetBrains MPS that I dubbed as Language Workbenches in 2005. Oslo doesn't exactly fit the definition for a language workbench that I gave back then. In particular the textual component isn't a projectional editor and you don't have to use a storage representation based on the abstract representation (semantic model), instead you can store the textual source in a more conventional style. This lesser reliance on a persistent abstract representation is similar to Xtext. At some point I really need to rethink what I consider the defining elements of a Language Workbench to be. For the moment let's just say that Xtext and Oslo feel like Language Workbenches and until I revisit the definition I'll treat them as such.

One particularly interesting point in this comparison is comparing Oslo with Microsoft's DSL tools. They are different tools with a lot of overlap, which makes you wonder if there's a place for both them. I've heard vague "they fit together" phrases, but am yet to be convinced. It could be one of those situations (common in big companies) where multiple semi-competing projects are developed. Eventually this could lead to one being shelved. But it's hard to speculate about this as much depends on corporate politics and it's thus almost impossible to get a straight answer out of anyone (and even if you do, it's even harder to tell if it is a straight answer).

The key element that Oslo shares with its cousins is that it provides a toolkit to define new languages, integrate them together, and define tooling for those languages. As a result you get the freedom of syntax of external DomainSpecificLanguages with decent tooling - something that deals with one of the main disadvantages of external DSLs.

Oslo supports both textual and graphical DSLs and seems to do so reasonably evenly (although we spent more time on the textual). In this regard it seems to provide more variety than MPS and Intentional (structured textual) and MetaEdit/Microsoft's DSL tools (graphical). It's also different in its textual support in that it provides real free text input not the highly structured text input of Intentional/MPS.

Using a compiled grammar that plugs into a text editor strikes me as a very nice route for supporting entering DSL scripts. Other tools either require you to have the full language workbench machinery or to use code generation to build editors. Passing around a representation of the grammar that I could plug into an editor strikes me as a good way to do it. Of course if that language workbench is Open Source (as I'm told MPS will be), then that may make this issue moot.

One of the big issues with storing stuff like this in a repository is handling version control. The notion that we can all collaborate on a single shared database (the moral equivalent of a team editing one copy of its code on a shared drive) strikes me as close to irresponsible. As a result I tend to look askance at any vendors who suggest this approach. The Oslo team suggests, wisely, that you treat the text files as the authoritative source which allows you to use regular version control tools. Of course the bad news for many Microsoft shops would be that this tool is TFS (or, god-forbid, VSS), but the great advantage of using plain text files as your source is that you can use any of the multitude of version control systems to store it.

A general thing I liked was most of the tools leant towards run-time interpretation rather than code generation and compilation. Traditionally parser generators and many language workbenches assume you are going to generate code from your models rather than interpreting them. Code generation is all very well, but it always has this messy feel to it - and tends to lead to all sorts of ways to trip you up. So I do prefer the run-time emphasis.

It was only a couple of hours, so I can't make any far-reaching judgements about Oslo. I can, however, say it looks like some very interesting technology. What I like about it is that it seems to provide a good pathway to using language workbenches. Having Microsoft behind it would be a big deal although we do need to remember that all sorts of things were promised about Longhorn that never came to pass. But all in all I think this is an interesting addition to the Language Workbench scene and a tool that could make DSLs much more prevalent.