martinfowler.com logo Home Blog Articles Books About Me Contact Me ThoughtWorks

Martin Fowler's Bliki

A cross between a blog and wiki of my partly-formed ideas on software development


DslExceptionalism dsl 22 December 2008 Reactions

One of the tricky things about writing about external DomainSpecificLanguages is that I'm walking through territory already heavily tracked by the programming languages community. Programming language research has always been a popular area of academic activity, and I'm the first to admit that I don't have anywhere near the depth in this topic as many people who've been studying in this space for years. So inevitably the question comes up as to why such a noob as me thinks he can write a book in this well trodden ground?

The primary reason is that nobody else has written a practitioner-oriented book on DSLs. I like topics like this that are well-trodden but not well written about. However as I've spent time walking these pathways I think there's another factor in the works.

There's a lot of work on programming languages out there, but almost all of it has concentrated on general purpose programming languages. DSLs are seen as a small and simple subset of general purpose programming thinking. As a result people think that what's true for general purpose languages is also true for DSLs (with the implication that DSLs are too small to be worth thinking much about).

I'm increasingly of the opposite conclusion. The rules for DSLs are different to the rules for general purpose languages - and this applies on multiple dimensions.

The first is in language design. I was talking with a language designer who I have a lot of respect for, and he stressed that a key feature of languages was the ability to define new abstractions. With DSLs I don't think this is the case. In most DSLs the DSL chooses the abstraction you work with, if you want different abstractions you use a different DSL (or maybe extend the DSL you're using). Sometimes there's a role for new abstractions, but those cases are the minority and when they do happen the abstractions are limited. Indeed I think the lack of ability to define new abstractions is one of the things that distinguishes DSLs from general purpose languages.

Differences also occur in the approach that you use for implementing the tools that go with languages. A constant issue for general purpose languages is dealing with large inputs, since realistic programs will have thousands or millions of lines of code. As a result many tools and techniques for using them involve aspects that make parsing harder to follow but support these large inputs. DSL scripts tend to be much smaller, so these trade-offs work differently.

In my work I've put a lot of emphasis on using a DSL to populate a Semantic Model, using that model as the basis for any further processing: interpretation, visualization, or code generation. Lots of language writing I've seen tend to emphasize code generation, often generating code directly from the grammar file. Intermediate representations are not talked about much, and when they do appear they more in the form of an Abstract Syntax Tree rather than a semantic model. Serious compilers do use intermediate representations, such as program dependence graphs, but these are seen (rightly) as advanced topics. I think Semantic Models are a really valuable tool in simplifying the use of a DSL, allowing you to separate the parsing from the semantics.

Since DSLs are less expressive, you can design a simpler language for them. Much of the language community's writing talks about how to handle the difficulties of a complex general purpose language, while the challenge of DSLS is to write a language that is readable to the intended audience (which may well include non-programmers) and also should be easy to parse (to simplify the maintenance of the parser). Not just does this lead to different decisions on the design of a language, it also means that you only really need a subset of the features of parser generators.

A consequence of this is DSLs are written with the expectation that each individual DSL won't solve the whole problem at hand and often you need to combine DSLs. Traditional language thinking hasn't explored the idea of composable languages that much, but I think this topic is very important as DSLs develop. Thinking about composable languages should have significant effects on both language design and language tools.

So I'm increasingly coming around to the thinking that DSLs inspire some seriously different ways of thinking about programming languages. It may also lead to developing different kinds of parsing tools that are more suited for DSL work - usually tools that are simpler. I hope the increased attention that DSLs are getting these days will lead to more people treating DSLs as first class subjects of study rather than a simplistic form of general purpose languages.


AcademicRotation design 17 December 2008 Reactions

A while ago I was chatting with a post-doc on his way to an academic career. He was asking me about research topics wanting my input as he felt I could inform him on what would be research of practical use. I wasn't very helpful, but I did mention that the best way to do this would be to spend some time in industry to get a feel of how software development works in the wild and what problems could do with some research effort.

His answer to this thought was very troubling. He said he'd be up to do that, but if he spent time in industry that would ruin his chances of getting a job in academia. Competition for academic jobs is high, and what they look it is your publication history. A year or two in industry would create a gap in your publication history that would be lethal to your job prospects.

The divide between academia and industry has always been an awkward one for software (as indeed for other professions). My contacts with academia have been stilted at best. The academics I respect are, I'm told, not highly regarded within academia because the things that I count as useful are usually dismissed by the academic community.

A good example of where this came to a head is the patterns community. Those involved in the patterns world were keen to look at practice to discover, package, and document techniques that had been proved through experience. But this is in direct opposition to academic standards which consider value to lie in novel things. My work, for example, is generally dismissed because all I do is write about stuff that is old hat (at least to some).

I think this is a terrible shame, not because I'm looking for an academic post, but because I think there is huge value in mining effective techniques from the experience of software development. To me it seems that trying to draw lessons from our experience is a very worthwhile academic activity. By devaluing it the academic world is ignoring a fruitful avenue to improve the capabilities of our profession.

If my opinion counted, I'd argue that any academic department worthy of note should include a group of faculty with a long experience of the day-to-day of industrial software development. They would be valued on how they had reflected on this experience and drew from the lessons to inform their teaching and research. I'd like to see a regular rotation of people from the academic to the industrial world, where it's common to see people spend several years in industry, then academia, then industry again, and so on.

This problem isn't only in software. A friend of mine had the chief engineer role in one of the most challenging engineering projects in the world. He fancied a stint in academia, but was only able to get a second-class position reserved for people who weren't considered to be real academics, certainly not something that was tenured or would lead to tenure. I find it hard to believe that students wouldn't gain an enormous amount from being taught by people with a long and thoughtful experience in the profession they are entering.

It's always frustrating to see communication gaps between different groups within the same profession. I've become a big fan using Rotation to help open up communication channels, as people are the key to good knowledge transfer. Being tolerant of academic rotation, indeed encouraging it, could do a great deal to make academia more aware of where industry needs help and industry more aware of where academics can improve practice.


UpcomingTalks writing 15 December 2008 Reactions

Pramod Sadalage is one of the leading innovators of evolutionary database design.

Our profession seems to be constantly hampered by the communication barriers we erect for ourselves. For enterprise systems one of the most annoying barriers is the one between application developers and database people. Although much of my early years involved databases and data modeling, my involvement with object-orientation cast me firmly into the application development side. As a consequence I haven't spent much time talking to people from the database community.

On January 9 I get a rare opportunity to fix that as I'll be speaking at my local chapter of DAMA with my colleague Pramod Sadalage. Pramod played a central role in the development of evolutionary database design techniques in ThoughtWorks as a DBA who closely works with development teams. His also written valuable books on Refactoring Databases and Continuous Database Integration. So, as you might have gathered, he knows all the material backwards and I'm glad to be invited along for the ride.

We'll be talking about database refactoring and techniques for evolutionary database design. It will be interesting to see how this talk is received. I'm told that these topics are still seen as highly controversial in many parts of the data community, yet these are techniques that are so usual for ThoughtWorks that they are just part of the furniture of our development projects.

Looking further out - I'll be appearing in London again for QCon. I'll update this page later with more details about what I'll be doing there.


BusinessReadableDSL dsl 15 December 2008 Reactions

Will DSLs allow business people to write software rules without involving programmers?

When people talk about DSLs it's common to raise the question of business people writing code for themselves. I like to apply the COBOL inference to this line of thought. That is that one of the original aims of COBOL was to allow people to write software without programmers, and we know how that worked out. So when any scheme is hatched to write code without programmers, I have to ask what's special this time that would make it succeed where COBOL (and so many other things) have failed.

I do think that programming involves a particular mind-set, an ability to both give precise instructions to a machine and the ability to structure a large amount of such instructions to make a comprehensible program. That talent, and the time involved to understand and build a program, is why programming has resisted being disintermediated for so long. It's also why many "non-programming" environments end up breeding their own class of programmers-in-fact.

That said, I do think that the greatest potential benefit of DSLs comes when business people participate directly in the writing of the DSL code. The sweet spot, however is in making DSLs business-readable rather than business-writeable. If business people are able to look at the DSL code and understand it, then we can build a deep and rich communication channel between software development and the underlying domain. Since this is the Yawning Crevasse of Doom in software, DSLs have great value if they can help address it.

With a business-readable DSL, programmers write the code but they show that code frequently to business people who can understand what it means. These customers can then make changes, maybe draft some code, but it's the programmers who make it solid and do the debugging and testing.

This isn't to say that there's no benefit in a business-writable DSL. Indeed a couple of years ago some colleagues of mine built a system that included just that, and it was much appreciated by the business. It's just that the effort in creating a decent editing environment, meaningful error messages, debugging and testing tools raises the cost significantly.

While I'm quick to use the COBOL inference to diss many tools that seek to avoid programmers, I also have to acknowledge the big exception: spreadsheets. All over the world suprisingly big business functions are run off the back of Excel. Serious programmers tend to look down their noses at these, but we need to take them more seriously and try to understand why they have been as successful as they are. It's certainly part of the reason that drives many LanguageWorkbench developers to provide a different vision of software development.


EstimatedInterest agile 10 December 2008 Reactions

Update at End

TechnicalDebt is a very useful concept, but it raises the question of how do you measure it? Sadly technical debt isn't like financial debt, so it's not easy to tell how far you are in hock (although we seem to have had some trouble with measuring the financial kind recently).

Here's one idea to consider. When a team completes a feature ask them to tell you how long it took them (the actual effort) and how long they think it would have taken if the system were properly clean. The difference between the two is the interest of the technical debt. (So if it actually took them 5 days but they think it would have taken them 3 days with a clean system, then you paid 2 days of effort as interest on your technical debt.)

There are certainly some serious flaws with this technique. The statement of how long it would have taken on a clean system is an estimate based on an imaginary state - so is difficult to make objective. There's the effort in capturing this information, which is easy to get out of hand. But the result may help project a picture of the state of the code-base in a way that's visible to non-technical staff.

Furthermore it may also help with decisions about whether to pay the principal. Some teams like to add technical debt stories to their product backlog - with estimates on how long it would take to remove them. Such technical debt stories are also estimates, but also provide a picture of how much debt has built up. You could get a bit more clever with the estimated interest payments by apportioning them to these debt stories (I spent an extra day on this feature because of the bad state of the flipper module). Comparing interest payments with the principal may help inform a decision about whether to pay off the principal.

I ran into someone recently who tried something a little like this and found it handy, but it's not something I've run into a lot. Certainly there are flaws with doing it - but it may be worth a try for a few iterations.

Update: A recent discussion surfaced another way to capture the estimated interest. During a retrospective (which wise teams do at the end of each iteration) capture an estimate of interest paid against each of the problem areas of the system. Doing this estimate against recent completed work may be easier than forward estimates against future stories.


HumaneRegistry design 1 December 2008 Reactions

One of the features of the new world of services that SOA-gushers promoted was the notion of registries. Often this was described in terms of automated systems that would allow systems to automatically look up useful services in a registry and bind and consume those services all by themselves.

Well computers may look clever occasionally, but I didn't particularly buy that idea. While there might the be odd edge case for automated service lookup, I reckon twenty-two times out of twenty it'll be a human programmer who is doing the looking up.

I was chatting recently to my colleague Erik Dörnenburg about a project he did with Halvard Skogsrud to build a service registry that was designed for humans to use and maintain. The organization was already using ServiceCustodians to manage the development on the project, so the registry needed to work in that context. This led to the following principles:

  • People develop and use services, so orient it around people (sorry UDDI, thank you for playing).
  • Don't expect people to enter stuff to keep it up to date, people are busy enough as it is.
  • Make it easy for people to read and contribute.

The heart of the registry is a wiki that allows people to easily enter information on a particular service. Not just the builders of the service, but also people who've used it. After all users' opinions are often more useful than providers (I'm guessing product review sites get more traffic than the vendors' sites).

A wiki makes it easy for people to describe the service, but that relies on people having time to contribute. A wiki helps make that easy as you can just click and go, but there's still time involved. So they backed up the human entry with some useful information gathered automatically.

  • A tool that interrogates the source code control systems and displays who has committed to a service, when, and how much. This helps human readers find out who are the other humans who they should talk to. Someone who did most of the commits, even if a while ago, probably knows a lot about the core design and purpose of the service. People who made a few recent commits might know more about the recent usage and quirks.
  • RSS feeds from CI servers and source code control systems.
  • Task and bug information from issue tracking systems.
  • Traffic data from the message bus indicating how much the service is used, and when. Also the message bus gives some clues about the consumers of the service.
  • Interceptors in the EJB container that captured consumer application names - again to get a sense of who is consuming the service. These were on the consumer side to capture consumer application names, and on the service to get a sense of the usage patterns.
  • Information from the Ivy dependencies.

Much of this functionality was inspired by ohloh.net, in particular this view.

The point of a registry like this is that it does a lot of automated work to get information, but presents it in a way that expects a human reader. Furthermore it understands that the most important questions the human reader has are about the humans who have worked on the project: who are they, when did they work on this, who should I email, and where do I go for a really good caipirinha?


DatabaseThaw design 24 November 2008 Reactions

A few years ago I heard programming language people talk about the "Nuclear Winter" in languages caused by Java. The feeling was that everyone had so converged on Java's computational model (C# at that point seen as little more than a rip-off) that creativity in programming languages had disappeared. That feeling is now abating, but perhaps a more important thaw that might be beginning - the longer and deeper freeze in thinking about databases.

Tim Bray's thought-provoking keynote talked about storage; including highlighting several alternatives to the conventional database world

When I started in the software development profession, I worked with several people who had evangelized relational databases. I came across them in the object-oriented world. Many people at that time expected OO databases to be the next evolutionary step for databases. As we now know, that didn't happen. These days relational databases are so deeply embedded that most projects assume an RDBMS right out of the gate.

At QCon last week, there was a strong thread of talks that questioned this assumption. Certainly one that struck me was Tim Bray's keynote, which took a journey through several aspects of data management. In doing so he highlighted a number of interesting projects.

  • Drizzle is a form of relational database, but one that eschews much of the machinery of modern relational products. I think of it as a RISC RDBMS - supporting only the bare bones of the relational feature set.
  • Couch DB is one of many forays into a distributed key-value pair model. Although a sharply simple data-model (nothing more than a hashmap really) this kind of approach has become quite popular in high-volume websites.
  • Gemstone was one of the object database crowd, and I found the Gemstone-Smalltalk combination a very powerful development environment (superior to most of its successors). Gemstone is still around as a niche player, but may gain more traction through Maglev - a project to bring its approach (essentially a fusion of database and virtual machine) to the Ruby world.

As well as this talk, there was a whole track on alternative databases hosted by Kresten Krab Thorup. One of the additional tools mentioned there was Neo4J - a graph (network) database tool that earned some rare praise from Jim Webber.

The natural question to ask about these products is why they should prevail when the ODBMSs failed. What's changed in the environment that could thaw the relational grip? There are many hypotheses about why relational has been so dominant - my opinion is that their dominance is due less to their role in data management than their role in integration.

Kresten Krab Thorup does a great job as a leader of the technical content of the JAOO and QCon conferences.

For many organizations today, the primary pattern for integration is Shared Database Integration - where multiple applications are integrated by all using a common database. When you have these IntegrationDatabases, it's important that all these applications can easily get at this shared data - hence the all important role of SQL. The role of SQL as mostly-standard query language has been central to the dominance of databases.

The heating of the database space comes from the presence of alternatives to integration - in particular the rise of web services. Under various banners there's a growing movement for applications to talk to each other by passing text (mostly XML) documents over HTTP. The web, both in internet and intranet forms, has made this integration mode even more prevalent than SQL. This is a good thing, I've never liked the approach of multiple applications tightly coupled through a common database - you can't get bigger breach of encapsulation than that.

If you switch your integration protocol from SQL to HTTP, it now means you can change databases from being IntegrationDatabases to ApplicationDatabases. This change is profound. In the first step it supports a much simpler approach to object-relational mapping - such as the approach taken by Ruby on Rails. But furthermore it breaks the vice-like grip of the relational data model. If you integrate through HTTP it no longer matters how an application stores its own data, which in turn means an application can choose a data model that makes sense for its own needs.

I don't think this means that relational databases will disappear - after all they are the right choice for many situations. But it does mean that now application developers should think about what the right option is for their needs. As non-relational projects grow in popularity and maturity, more and more will go for other options.


14 November 2008ServiceCustodian
4 November 2008EarlyPain
28 October 2008Oslo
16 September 2008ObservedRequirement
12 September 2008EvolutionarySOA
9 September 2008DslQandA
4 August 2008DslBookRoadmap
14 July 2008ModelDrivenSoftwareDevelopment
14 July 2008MDSDandDSL
7 July 2008IncrementalMigration
26 June 2008AgileVersusLean
24 June 2008SegmentationByFreshness
9 June 2008SyntacticNoise
20 May 2008ParserFear
12 April 2008SchoolsOfSoftwareDevelopment
8 February 2008CheaperTalentHypothesis
17 January 2008PreferDesignSkills
14 January 2008RepositoryBasedCode
6 December 2007TestCancer
4 December 2007BookCode
28 November 2007GroovyOrJRuby
9 October 2007AltNetConf
9 September 2007RollerSkateImplementation
7 September 2007DoctorWho
6 September 2007TimeZoneUncertainty
4 September 2007CustomerLoyaltySoftware
2 September 2007IsChangingInterfacesRefactoring
28 July 2007OneLanguage
28 July 2007RubyMicrosoft
27 July 2007InstallingOpenArchitectureWare
13 July 2007DslReadings
12 July 2007UiPatternsReadings
20 June 2007DesignStaminaHypothesis
13 June 2007DuplexBook
30 May 2007HelloRacc
22 May 2007RailsConf2007
13 May 2007HelloCup
10 May 2007Translations
26 April 2007OutputBuildTarget
26 April 2007TouchFile
26 April 2007PendingHead
17 April 2007FlexibleAntlrGeneration
3 April 2007NetNastiness
26 March 2007EmbedmentHelper
Links
home
bliki
feed 
Translations
Japanese
Spanish
Korean
Chinese
Thai
Categories
agile
design
dsl
leisure
refactoring
ruby
thoughtWorks
tools
uml
writing
Blog Roll
ThoughtBlogs
TW Alumni
Nicholas Carr
Steve Cook
Brian Foote
Simon Harris
Gregor Hohpe
/\ndy Hunt
Ralph Johnson
Patrick Logan
David Ing
Brian Marick
Jeremy Miller
Jimmy Nilsson
Samuel Pepys
Keith Ray
Johanna Rothman
Kathy Sierra
Dave Thomas

martinfowler.com logo mingle logo thoughtworks logo

© Copyright Martin Fowler, all rights reserved