Martin Fowler's Bliki
A cross between a blog and wiki of my partly-formed ideas on software development
| DslExceptionalism |
dsl |
22 December 2008 |
Reactions |
|
One of the tricky things about writing about external
DomainSpecificLanguages is that I'm walking through
territory already heavily tracked by the programming languages
community. Programming language research has always been a popular
area of academic activity, and I'm the first to admit that I don't
have anywhere near the depth in this topic as many people who've
been studying in this space for years. So inevitably the question
comes up as to why such a noob as me thinks he can write a book in
this well trodden ground? The primary reason is that nobody else has written a
practitioner-oriented book on DSLs. I like topics like this that are
well-trodden but not well written about. However as I've spent time
walking these pathways I think there's another factor in the
works. There's a lot of work on programming languages out there, but
almost all of it has concentrated on general purpose programming
languages. DSLs are seen as a small and simple subset of general
purpose programming thinking. As a result people think that what's
true for general purpose languages is also true for DSLs (with the
implication that DSLs are too small to be worth thinking much about). I'm increasingly of the opposite conclusion. The rules for DSLs
are different to the rules for general purpose languages - and this
applies on multiple dimensions. The first is in language design. I was talking with a language
designer who I have a lot of respect for, and he stressed that a key
feature of languages was the ability to define new
abstractions. With DSLs I don't think this is the case. In most DSLs
the DSL chooses the abstraction you work with, if you want different
abstractions you use a different DSL (or maybe extend the DSL you're
using). Sometimes there's a role for new abstractions, but those
cases are the minority and when they do happen the abstractions are
limited. Indeed I think the lack of ability to define new
abstractions is one of the things that distinguishes DSLs from
general purpose languages. Differences also occur in the approach that you use for
implementing the tools that go with languages. A constant issue for
general purpose languages is dealing with large inputs, since
realistic programs will have thousands or millions of lines of
code. As a result many tools and techniques for using them involve
aspects that make parsing harder to follow but support these large
inputs. DSL scripts tend to be much smaller, so these trade-offs
work differently. In my work I've put a lot of emphasis on using a DSL to populate
a Semantic
Model, using that model as the basis for any further
processing: interpretation, visualization, or code generation. Lots
of language writing I've seen tend to emphasize code generation,
often generating code directly from the grammar file. Intermediate
representations are not talked about much, and when they do appear
they more in the form of an Abstract Syntax Tree rather than a
semantic model. Serious compilers do use intermediate
representations, such as program dependence graphs, but these are
seen (rightly) as advanced topics. I think Semantic Models are a
really valuable tool in simplifying the use of a DSL, allowing you
to separate the parsing from the semantics. Since DSLs are less expressive, you can
design a simpler language for them. Much of the language community's writing
talks about how to handle the difficulties of a complex general
purpose language, while the challenge of DSLS is to write a language
that is readable to the intended audience (which may well include
non-programmers) and also should be easy to parse (to simplify the
maintenance of the parser). Not just does this lead to different
decisions on the design of a language, it also means that you only
really need a subset of the features of parser generators. A consequence of this is DSLs are written with the expectation
that each individual DSL won't solve the whole problem at hand and
often you need to combine DSLs. Traditional language thinking hasn't
explored the idea of composable languages that much, but I think
this topic is very important as DSLs develop. Thinking about
composable languages should have significant effects on both language
design and language tools. So I'm increasingly coming around to the thinking that DSLs
inspire some seriously different ways of thinking about programming
languages. It may also lead to developing different kinds of parsing
tools that are more suited for DSL work - usually tools that are
simpler. I hope the increased attention that DSLs are getting these
days will lead to more people treating DSLs as first class subjects
of study rather than a simplistic form of general purpose languages.
|
| AcademicRotation |
design |
17 December 2008 |
Reactions |
|
A while ago I was chatting with a post-doc on his way to an
academic career. He was asking me about research topics wanting my
input as he felt I could inform him on what would be research of
practical use. I wasn't very helpful, but I did mention that the
best way to do this would be to spend some time in industry to get a
feel of how software development works in the wild and what problems
could do with some research effort. His answer to this thought was very troubling. He said he'd be up
to do that, but if he spent time in industry that would ruin his
chances of getting a job in academia. Competition for academic jobs
is high, and what they look it is your publication history. A year
or two in industry would create a gap in your publication history
that would be lethal to your job prospects. The divide between academia and industry has always been an
awkward one for software (as indeed for other professions). My
contacts with academia have been stilted at best. The academics I
respect are, I'm told, not highly regarded within academia because
the things that I count as useful are usually dismissed by the
academic community. A good example of where this came to a head is the patterns
community. Those involved in the patterns world were keen to look
at practice to discover, package, and document techniques that had
been proved through experience. But this is in direct opposition to
academic standards which consider value to lie in novel things. My
work, for example, is generally dismissed because all I do is write
about stuff that is old hat (at least to some). I think this is a terrible shame, not because I'm looking for an
academic post, but because I think there is huge value in mining
effective techniques from the experience of software development. To
me it seems that trying to draw lessons from our experience is a very
worthwhile academic activity. By devaluing it the academic world is
ignoring a fruitful avenue to improve the capabilities of our profession. If my opinion counted, I'd argue that any academic department
worthy of note should include a group of faculty with a long
experience of the day-to-day of industrial software
development. They would be valued on how they had reflected on this
experience and drew from the lessons to inform their teaching and
research. I'd like to see a regular rotation of people from the
academic to the industrial world, where it's common to see people
spend several years in industry, then academia, then industry again,
and so on. This problem isn't only in software. A friend of mine had the
chief engineer role in one of the most challenging engineering
projects in the world. He fancied a stint in academia, but was only
able to get a second-class position reserved for people who weren't
considered to be real academics, certainly not something that was
tenured or would lead to tenure. I find it hard to believe that
students wouldn't gain an enormous amount from being taught by
people with a long and thoughtful experience in the profession they
are entering. It's always frustrating to see communication gaps between
different groups within the same profession. I've become a big fan
using Rotation to help open up communication channels, as
people are the key to good knowledge transfer. Being tolerant of academic
rotation, indeed encouraging it, could do a great deal to make
academia more aware of where industry needs help and industry more
aware of where academics can improve practice.
|
| UpcomingTalks |
writing |
15 December 2008 |
Reactions |
Pramod Sadalage is
one of the leading innovators of evolutionary database
design.
Our profession seems to be constantly hampered by the
communication barriers we
erect for ourselves. For enterprise systems one of the most annoying
barriers is the one between application developers and database
people. Although much of my early years involved
databases and data modeling, my involvement with
object-orientation cast me firmly into the application
development side. As a consequence I haven't spent much time talking
to people from the database community. On January 9 I get a rare opportunity to fix that as I'll be
speaking at my local chapter of DAMA with my colleague Pramod
Sadalage. Pramod played a central role in the development of
evolutionary database design techniques in ThoughtWorks as a DBA who
closely works with development teams. His also written valuable
books on Refactoring
Databases and Continuous
Database Integration. So, as you might have gathered, he knows
all the material backwards and I'm glad to be invited along for the
ride. We'll be talking about database refactoring and techniques for
evolutionary database design. It will be interesting to see how this
talk is received. I'm told that these topics are still seen as highly
controversial in many parts of the data community, yet these are
techniques that are so usual for ThoughtWorks that they are just
part of the furniture of our development projects. Looking further out - I'll be appearing in London again for QCon. I'll
update this page later with more details about what I'll be doing
there.
|
| BusinessReadableDSL |
dsl |
15 December 2008 |
Reactions |
|
Will DSLs allow business people to write software rules
without involving programmers?
When people talk about DSLs it's common to raise the question of
business people writing code for themselves. I like to apply the
COBOL inference to this line of thought. That is that one of the
original aims of COBOL was to allow people to write software without
programmers, and we know how that worked out. So when any scheme is
hatched to write code without programmers, I have to ask what's
special this time that would make it succeed where COBOL (and so
many other things) have failed. I do think that programming involves a particular mind-set, an
ability to both give precise instructions to a machine and the
ability to structure a large amount of such instructions to make a
comprehensible program. That talent, and the time involved to
understand and build a program, is why programming has resisted
being disintermediated for so long. It's also why many
"non-programming" environments end up breeding their own class of
programmers-in-fact. That said, I do think that the greatest potential benefit of DSLs
comes when business people participate directly in the writing of
the DSL code. The sweet spot, however is in making DSLs
business-readable rather than business-writeable. If
business people are able to look at the DSL code and understand it,
then we can build a deep and rich communication channel between
software development and the underlying domain. Since this is the Yawning
Crevasse of Doom in software, DSLs have great value if they can
help address it. With a business-readable DSL, programmers write the code but they
show that code frequently to business people who can understand what
it means. These customers can then make changes, maybe draft some
code, but it's the programmers who make it solid and do the
debugging and testing. This isn't to say that there's no benefit in a business-writable
DSL. Indeed a couple of years ago some colleagues of mine built a
system that included just that, and it was much
appreciated by the business. It's just that the effort in creating a
decent editing environment, meaningful error messages, debugging and
testing tools raises the cost significantly. While I'm quick to use the COBOL inference to diss many tools
that seek to avoid programmers, I also have to acknowledge the big
exception: spreadsheets. All over the world suprisingly big business
functions are run off the back of Excel. Serious programmers tend to
look down their noses at these, but we need to take them more
seriously and try to understand why they have been as successful as
they are. It's certainly part of the reason that drives many
LanguageWorkbench developers to provide a different
vision of software development.
|
| EstimatedInterest |
agile |
10 December 2008 |
Reactions |
|
Update at End TechnicalDebt is a very useful concept, but it raises
the question of how do you measure it? Sadly technical debt isn't
like financial debt, so it's not easy to tell how far you are in
hock (although we seem to have had some trouble with measuring the
financial kind recently). Here's one idea to consider. When a team completes a feature ask
them to tell you how long it took them (the actual effort) and how
long they think it would have taken if the system were properly
clean. The difference between the two is the interest of the
technical debt. (So if it actually took them 5 days but they think
it would have taken them 3 days with a clean system, then you paid 2
days of effort as interest on your technical debt.) There are certainly some serious flaws with this technique. The
statement of how long it would have taken on a clean system is an
estimate based on an imaginary state - so is difficult to make
objective. There's the effort in capturing this information, which
is easy to get out of hand. But the result may help project a
picture of the state of the code-base in a way that's visible to
non-technical staff. Furthermore it may also help with decisions about whether to pay
the principal. Some teams like to add technical debt stories to
their product backlog - with estimates on how long it would take to
remove them. Such technical debt stories are also estimates, but
also provide a picture of how much debt has built up. You could get
a bit more clever with the estimated interest payments by
apportioning them to these debt stories (I spent an extra day on
this feature because of the bad state of the flipper
module). Comparing interest payments with the principal may help
inform a decision about whether to pay off the principal. I ran into someone recently who tried something a little like
this and found it handy, but it's not something I've run into a
lot. Certainly there are flaws with doing it - but it may be worth a
try for a few iterations. Update: A recent discussion surfaced another way to
capture the estimated interest. During a retrospective (which wise
teams do at the end of each iteration) capture an estimate of
interest paid against each of the problem areas of the system. Doing
this estimate against recent completed work may be easier than
forward estimates against future stories.
|
| HumaneRegistry |
design |
1 December 2008 |
Reactions |
|
One of the features of the new world of services that SOA-gushers
promoted was the notion of registries. Often this was described in
terms of automated systems that would allow systems to automatically
look up useful services in a registry and bind and consume those
services all by themselves. Well computers may look clever occasionally, but I didn't
particularly buy that idea. While there might the be odd edge case
for automated service lookup, I reckon twenty-two times out of
twenty it'll be a human programmer who is doing the looking up. I was chatting recently to my colleague Erik Dörnenburg about a
project he did with Halvard Skogsrud to build a service registry
that was designed for humans to use and maintain. The organization
was already using ServiceCustodians to manage the
development on the project, so the registry needed to work in that
context. This led to the following principles: - People develop and use services, so orient it around people
(sorry UDDI, thank you for playing).
- Don't expect people to enter stuff to keep it up to date,
people are busy enough as it is.
- Make it easy for people to read and contribute.
The heart of the registry is a wiki that allows people to easily
enter information on a particular service. Not just the builders of
the service, but also people who've used it. After all users'
opinions are often more useful than providers (I'm guessing product
review sites get more traffic than the vendors' sites). A wiki makes it easy for people to describe the service, but that
relies on people having time to contribute. A wiki helps make that
easy as you can just click and go, but there's still time
involved. So they backed up the human entry with some useful
information gathered automatically. - A tool that interrogates the source code control systems and
displays who has committed to a service, when, and how much. This
helps human readers find out who are the other humans who they
should talk to. Someone who did most of the commits, even if a while
ago, probably knows a lot about the core design and purpose of the
service. People who made a few recent commits might know more about
the recent usage and quirks.
- RSS feeds from CI servers and source code control systems.
- Task and bug information from issue tracking systems.
- Traffic data from the message bus indicating how much the
service is used, and when. Also the message bus gives some clues
about the consumers of the service.
- Interceptors in the EJB container that captured consumer
application names - again to get a sense of who is consuming the
service. These were on the consumer side to capture consumer
application names, and on the service to get a sense of the usage
patterns.
- Information from the Ivy dependencies.
Much of this functionality was inspired by ohloh.net, in
particular this view. The point of a registry like this is that it does a lot of
automated work to get information, but presents it in a way that
expects a human reader. Furthermore it understands that the
most important questions the human reader has are about the humans
who have worked on the project: who are they, when did they work on
this, who should I email, and where do I go for a really good
caipirinha?
|
| DatabaseThaw |
design |
24 November 2008 |
Reactions |
|
A few years ago I heard programming language people talk about
the "Nuclear Winter" in languages caused by Java. The feeling was
that everyone had so converged on Java's computational model (C# at
that point seen as little more than a rip-off) that creativity in
programming languages had disappeared. That feeling is now abating,
but perhaps a more important thaw that might be beginning - the
longer and deeper freeze in thinking about databases.
Tim Bray's
thought-provoking keynote talked about storage; including
highlighting several alternatives to the conventional database
world
When I started in the software development profession, I worked
with several people who had evangelized relational databases. I came
across them in the object-oriented world. Many people at that time
expected OO databases to be the next evolutionary step for
databases. As we now know, that didn't happen. These days
relational databases are so deeply embedded that most projects
assume an RDBMS right out of the gate. At QCon last
week, there was a strong thread of talks that questioned this
assumption. Certainly one that struck me was Tim Bray's keynote, which took a
journey through several aspects of data management. In doing so he
highlighted a number of interesting projects. - Drizzle is a form of relational database, but one that eschews
much of the machinery of modern relational products. I think
of it as a RISC RDBMS - supporting only the bare bones of the
relational feature set.
- Couch DB
is one of many forays into a distributed key-value pair
model. Although a sharply simple data-model (nothing more than a
hashmap really) this kind of approach has become quite popular in
high-volume websites.
- Gemstone was one
of the object database crowd, and I found the Gemstone-Smalltalk
combination a very powerful development environment (superior to
most of its successors). Gemstone is still around as a niche
player, but may gain more traction through Maglev - a project to
bring its approach (essentially a fusion of database and virtual
machine) to the Ruby world.
As well as this talk, there was a whole track on alternative
databases hosted by Kresten Krab Thorup. One of the additional
tools mentioned there was Neo4J -
a graph (network) database tool that earned some rare praise from
Jim Webber. The natural question to ask about these products is why they
should prevail when the ODBMSs failed. What's changed in the
environment that could thaw the relational grip? There are many
hypotheses about why relational has been so dominant - my opinion
is that their dominance is due less to their role in data
management than their role in integration.
Kresten
Krab Thorup does a great job as a leader of the technical content of the JAOO and QCon conferences.
For many organizations today, the primary pattern for integration
is Shared Database Integration - where multiple applications are
integrated by all using a common database. When you have these
IntegrationDatabases, it's important that all these applications
can easily get at this shared data - hence the all important role
of SQL. The role of SQL as mostly-standard query language has been
central to the dominance of databases. The heating of the database space comes from the presence of
alternatives to integration - in particular the rise of web
services. Under various banners there's a growing movement for
applications to talk to each other by passing text (mostly XML)
documents over HTTP. The web, both in internet and intranet forms,
has made this integration mode even more prevalent than SQL. This
is a good thing, I've never liked the approach of multiple
applications tightly coupled through a common database - you can't
get bigger breach of encapsulation than that. If you switch your integration protocol from SQL to HTTP, it now
means you can change databases from being IntegrationDatabases to
ApplicationDatabases. This change is profound. In the first step it
supports a much simpler approach to object-relational mapping -
such as the approach taken by Ruby on Rails. But furthermore it
breaks the vice-like grip of the relational data model. If you
integrate through HTTP it no longer matters how an application
stores its own data, which in turn means an application can choose
a data model that makes sense for its own needs. I don't think this means that relational databases will disappear
- after all they are the right choice for many situations. But it
does mean that now application developers should think about what
the right option is for their needs. As non-relational projects
grow in popularity and maturity, more and more will go for other
options.
|
|
|