21 May 2019

Software systems are prone to the build up of cruft - deficiencies in internal quality that make it harder than it would ideally be to modify and extend the system further. Technical Debt is a metaphor, coined by Ward Cunningham, that frames how to think about dealing with this cruft, thinking of it like a financial debt. The extra effort that it takes to add new features is the interest paid on the debt.

Imagine I have a confusing module structure in my code base. I need to add a new feature. If the module structure was clear, then it would take me four days to add the feature but with this cruft, it takes me six days. The two day difference is the interest on the debt.

What most appeals to me about the debt metaphor is how it frames how I think about how to deal with this cruft. I could take five days to clean up the modular structure, removing that cruft, metaphorically paying off the principal. If I only do it for this one feature, that's no gain, as I'd take nine days instead of six. But if I have two more similar features coming up, then I'll end up faster by removing the cruft first.

Stated like that, this sounds like a simple matter of working the numbers, and any manager with a spreadsheet should figure out the choices. Sadly since we CannotMeasureProductivity, none of these costs are objectively measurable. We can estimate how long it takes to do a feature, estimate what it might be like if the cruft was removed, and estimate the cost of removing the cruft. But our accuracy of such estimates is pretty low.

Given this, usually the best route is to do what we usually do with financial debts, pay the principal off gradually. On the first feature I'll spend an extra couple of days to remove some of the cruft. That may be enough to reduce the interest rate on future enhancements to a single day. That's still going to take extra time, but by removing the cruft I'm making it cheaper for future changes to this code. The great benefit of gradual improvement like this is that it naturally means we spend more time on removing cruft in those areas that we modify frequently, which are exactly those areas of the code base where we most need the cruft to be removed.

Thinking of this as paying interest versus paying of principal can help decide which cruft to tackle. If I have a terrible area of the code base, one that's a nightmare to change, it's not a problem if I don't have to modify it. I only trigger an interest payment when I have to work with that part of the software (this is a place where the metaphor breaks down, since financial interest payments are triggered by the passage of time). So crufty but stable areas of code can be left alone. In contrast, areas of high activity need a zero-tolerance attitude to cruft, because the interest payments are cripplingly high. This is especially important since cruft accumulates where developers make changes without paying attention to internal quality - the more changes, the greater risk of cruft building up.

The metaphor of debt is sometimes used to justify neglecting internal quality. The argument is that it takes time and effort to stop cruft from building up. If there new features that are needed urgently, then perhaps it's best to take on the debt, accepting that this debt will have to be managed in the future.

The danger here is that most of the time this analysis isn't done well. Cruft has a quick impact, slowing down the very new features that are needed quickly. Teams who do this end up maxing out all their credit cards, but still delivering later than they would have done had they put the effort into higher internal quality. Here the metaphor often leads people astray, as the dynamics don't really match those for financial loans. Taking on debt to speed delivery only works if you stay below the design payoff line of the DesignStaminaHypothesis, and teams hit that line in weeks rather than months.

There are regular debates whether different kinds of cruft should be considered as debt or not. I found it useful to think about whether the debt is acquired deliberately and whether it is prudent or reckless - leading me to the TechnicalDebtQuadrant.

Further Reading

As far as I can tell, Ward first introduced this concept in an experience report for OOPSLA 1992. It has also been discussed on the wiki.

Ward Cunningham has a video talk where he discusses this metaphor he created.

Dave Nicolette expands on Ward's view of technical debt with a fine case study of what I refer to as Prudent Intentional debt

A couple of readers sent in some similarly good names. David Panariti refers to ugly programming as deficit programming. Apparently he originally started using a few years ago when it fitted in with government policy; I suppose it's natural again now.

Scott Wood suggested "Technical Inflation could be viewed as the ground lost when the current level of technology surpasses that of the foundation of your product to the extent that it begins losing compatibility with the industry. Examples of this would be falling behind in versions of a language to the point where your code is no longer compatible with main stream compilers."

Steve McConnell brings out several good points in the metaphor, particularly how keeping your unintended debt down gives you more room to intentionally take on debt when it's useful to do so. I also like his notion of minimum payments (which are very high to fix issues with embedded systems as opposed to web sites).

Aaron Erickson talks about Enron financing.

Henrik Kniberg argues that it's older technical debt that causes the greatest problem and that it's wise to a qualitative debt ceiling to help manage it.

Erik Dietrich discusses the human cost of technical debt: team infighting, atrophied skills, and attrition.


I originally published this post on October 1 2003. I gave it a thorough rewrite in April 2019.


5 March 2019

In my recent client engagement, I foresaw that serverless architecture was a perfect fit. The idea of adopting serverless architecture, though, didn’t fly to our client well due to the fear of vendor lock-in. It was an interesting time for retailers as staying in AWS might mean that Amazon, as another retail business, will be given a competitive advantage. Given the idea of not supporting a competitor, my client was interested to ensure that the solution chosen by us is fully portable to other cloud vendors.

From a technical perspective, ensuring that we have the ability to move our system from one platform to another is definitely desirable. With the advent of containerization, why would one be interested to be locked in a specific platform? A high lock-in cost is not something that we would like to show back to the business when we have decided to move another platform. We, therefore, need to make sure that the migration cost is as low as possible when this scenario happens. If I’m about to make a simple formula for lock-in cost with our current understanding, it would look like this:

Lock-in cost = Migration cost (?)

This formula is correct when we are looking at it only from a technical perspective. A business perspective, however, should not be overlooked. Remember that the technical solutions we deliver are always designed to solve business problems. Most of the times the business get a benefit when a particular technology is adopted. One of the significant benefits is a faster time to market. Faster time to market can be formulated into opportunity gain:

Lock-in cost = Migration cost - Opportunity gain

Opportunity gain is very difficult to measure because you are dealing with an unknown unknown. Migration cost can be analyzed and reasoned. Opportunity gain, in contrast, is not as easy to analyze. You can theorize and analyze how to migrate from one platform to another, but how would you calculate the gain of seizing your competitors’ market opportunity? By looking at your decision-making process from a holistic view, combining both the technical and business perspective, the lock-in decision you are taking might result in a profit.

Let’s have a look into an example of building an event-driven architecture. You will need to choose a distributed messaging system in the architecture. If you are already chosen AWS as your platform, you would have the option of vendor-specific services like Kinesis. These services are fully managed and you can get it running in no time, hence giving you an opportunity gain. In comparison with a vendor-agnostic system like Kafka, these vendor-specific services will incur a higher migration cost. Setting up your own distributed messaging system, however, will take more time to harden and for it to be made production ready, especially when you are not experienced in building such platform yet. Instead of looking at your decision from just migration cost, focus on how you can reduce the migration cost by making your system more adaptable. Especially in this example of using a cloud, this is a similar reason on why we recommend to avoid the practice of generic cloud usage.


Thanks to Chris Ford, Matt Newman, Luciano Ramalho, Tobias Vogel, Zhamak Dehghani, Kitson Kelly, and Peter Gillard-Moss for their inputs.

Special thanks to Martin Fowler for his support, suggestions, and time spent with the content and help with publishing.


16 January 2018

Integration tests determine if independently developed units of software work correctly when they are connected to each other. The term has become blurred even by the diffuse standards of the software industry, so I've been wary of using it in my writing. In particular, many people assume integration tests are necessarily broad in scope, while they can be more effectively done with a narrower scope.

As often with these things, it's best to start with a bit of history. When I first learned about integration testing, it was in the 1980's and the waterfall was the dominant influence of software development thinking. In a larger project, we would have a design phase that would specify the interface and behavior of the various modules in the system. Modules would then be assigned to developers to program. It was not unusual for one programmer to be responsible for a single module, but this would be big enough that it could take months to build it. All this work was done in isolation, and when the programmer believed it was finished they would hand it over to QA for testing.

The first part of testing would be unit testing, which would test that module on its own, against the specification that had been done in the design phase. Once that was complete, we then move to integration testing, where the various modules are combined together, either into the entire system, or into significant sub-systems.

The point of integration testing, as the name suggests, is to test whether many separately developed modules work together as expected. It was performed by activating many modules and running higher level tests against all of them to ensure they operated together. These modules could parts of a single executable, or separate.

Looking at it from a more 2010s perspective, these conflated two different things:

  • testing that separately developed modules worked together properly
  • test that a system of multiple modules worked as expected.

These two things were easy to conflate, after all how else would you test the frobile and twibbler modules without activating them both into a single environment and running tests that exercised both modules?

The 2010s perspective offers another alternative, one that was rarely considered in the 1980s. In the alternative, we test the integration of the frobile and twibbler modules by exercising the portion of the code in frobile that interacts with twibbler, executing it against a TestDouble of twibbler. Providing the test double is a faithful double of twibber, we can then test all the interaction behavior of twibbler without activating a full twibbler instance. This may not be a big deal if they are separate modules of a monolithic application, but is a big deal if twibbler is a separate service, which requires its own build tools, environments, and network connections. For services, such tests may run against an in-process test double, or against an over-the-wire double, using something like mountebank

An obvious catch with integration testing against a double is whether that double is truly faithful. But we can test that separately using ContractTests.

Using this combination of using narrow integration tests and contract tests, I can be confident of integrating against an external service without ever running tests against a real instance of that service, which greatly eases my build process. Teams that do this, may still do some form of end-to-end system test with all real services, but if so it's only a final smoke test with a very limited range of paths tested. It also helps to have a mature QA in Production capability, and if that is mature enough, there may be no end-to-end system testing done at all.

The problem is that we have (at least) two different notions of what constitutes an integration test.

narrow integration tests

  • exercise only that portion of the code in my service that talks to a separate service
  • uses test doubles of those services, either in process or remote
  • thus consist of many narrowly scoped tests, often no larger in scope than a unit test (and usually run with the same test framework that's used for unit tests)

broad integration tests

  • require live versions of all services, requiring substantial test environment and network access
  • exercise code paths through all services, not just code responsible for interactions

And there is a large population of software developers for whom “integration test” only means “broad integration tests”, leading to plenty of confusion when they run into people who use the narrow approach.

If your only integration tests are broad ones, you should consider exploring the narrow style, as it's likely to significantly improve your testing speed, ease of use, and resiliency. Since narrow integration tests are limited in scope, they often run very fast, so can run in early stages of a DeploymentPipeline, providing faster feedback should they go red.

All this is why I'm wary with “integration test”. When I read it, I look for more context so I know which kind the author really means. [1] If I talk about broad integration tests, I prefer to use “system test” or “end-to-end test”. I don’t have any better name for narrow integration tests, so I do use that (but with “narrow” to help signal to the reader the nature of these tests).


Birgitta Böckeler, Brian Oxley, Dave Rice, Deepti Mittal, Jonny Leroy, Kief Morris, Raimund Klein, Rogerio Chaves, and Tiago Griffo discussed drafts of this post on our internal mailing list.


1: Although I prefer to focus the definition on the interaction of separately built modules, I do occasionally see “integration test” used to mean anything bigger than a unit test. And for some users of solitary unit tests, I’ve seen them describe sociable unit tests as “integration tests”.


14 November 2017

I remember in my teens being told of the wonderful things Artificial Intelligence (AI) would do in the next few years. Now several decades later, some of these seem to be happening. The most recent triumph was of computers teaching each other to play Go by playing against each other, rapidly becoming more proficient than any human, with strategies human experts could barely comprehend. It's natural to wonder what will happen over the next few years, will computers soon have greater intelligence than humanity? (Given some recent election results, that may not be too hard a bar to cross.)

But as I hear of these, I recall Pablo Picasso's comment about computers many decades ago: "Computers are useless. They can only give you answers". The kind of reasoning that techniques such as Machine Learning can result in are truly impressive in their results, and will be useful to us as users and developers of software. But answers, while useful, aren't always the whole picture. I learned this in my early days of school - just providing the answer to a math problem would only get me a couple of marks, to get the full score I had to show how I got it. The reasoning that got to the answer was more valuable than the result itself. That's one of the limitations of the self-taught Go AIs. While they can win, they cannot explain their strategies.

Given this world, one of the big challenges I see for AI is that while we may have figured out Machine Learning in order to teach them to get answers, we haven't got systems that can do Machine Justification for their answers. As AIs make more judgments for us, we'll increasingly run into situations where the answer isn't enough. An AI might be trained in such a way to rule on legal cases, but could we accept a judgment where the AI cannot explain its reasoning?

Given this it seems likely that we will need a new class of "programmer' in the future, one whose job is to figure out why AIs get the answer they do, to deduce the reasoning underlying the AIs skills. We could see many fields where AIs make opaque judgments that we can see are good, but need another approach for us to really learn the theory that underlies their decisions.

This problem is particularly acute since we've discovered that it's awfully easy for these machines to learn undesirable behaviors from their training data, such as discriminating against racial minorities when judging credit ratings.

Like many, I see much of the opportunity of computers is in collaboration with humans. Good use of computers is understanding where the computer is strong (rapidly doing constrained work) and where humans are better, and using a mix. Computers are, at their most intellectual, a tool for the mind. In programming I'm happy to lean on the compiler to help me find errors or suggest alternatives, a practice which I was scolded for as a young programmer. That boundary between where the two are strongest is fluid, and one of the fascinations of the future is how we can best take advantage of its movement.

Further Reading

MIT Technology Review looks at the broad topic of explainability for AI.

Some articles in the dangers of machine learning and undesirable bias from The Atlantic, NPR, and Tech Republic


Brandon Byars, Chris Ford, Christoph Windheuser, Danilo Sato, Dave Elliman, Ian Cartwright, Kent Rahman, Saleem Siddiqui, Sallie Walecka, Tito Sarrionandia, and Vishal Bardoloi discussed drafts of this post on our internal mailing lists.
Translations: Portuguese


9 March 2017

Data encapsulation is a central tenet in object-oriented style. This says that the fields of an object should not be exposed publicly, instead all access from outside the object should be via accessor methods (getters and setters). There are languages that allow publicly accessible fields, but we usually caution programmers not to do this. Self-encapsulation goes a step further, indicating that all internal access to a data field should also go through accessor methods as well. Only the accessor methods should touch the data value itself. If the data field isn't exposed to the outside, this will mean adding additional private accessors.

Here's an example of a reasonably encapsulated java class

class Charge…

  private int units;
  private double rate;

  public Charge(int units, double rate) {
    this.units = units;
    this.rate = rate;
  public int getUnits() { return units; }
  public Money getAmount() { return Money.usd(units * rate); }

Both fields are immutable. The units field is exposed to clients of the class via a getter, but the rate field is only used internally, so doesn't need a getter.

Here is a version using self-encapsulation.

class ChargeSE…

  private int units;
  private double rate;

  public ChargeSE(int units, double rate) {
    this.units = units;
    this.rate = rate;
  public int getUnits()    { return units; }
  private double getRate() { return rate; }
  public Money getAmount() { return Money.usd(getUnits() * getRate()); }

Self encapsulaton means that getAmount needs to access both fields through getters. This also means I have to add a getter for rate, which I should make private.

Encapsulating mutable data is generally a good idea. Update functions can contain code to execute validations and consequential logic. By restricting access through functions, we can support the UniformAccessPrinciple, allowing us to hide which data is computed and which is stored. These accessors allow us to modify the data structures while retaining the same public interface. Different languages differ in details of what is "outside" for an object by various kinds of AccessModifier, but most environments support data encapsulation to some degree.

I've come across a few organizations that mandated self-encapsulation, and whether to use it or not was a regular topic of debate since the 90's. Its advocates said that encapsulation was such a benefit, that you wanted to incorporate it to internal access too. Critics argued that it was unnecessary ceremony leading to unnecessary code that obscured what was going on.

My view on this is that most of the time there's little value in self-encapsulation. The value of encapsulation is proportional to the scope of the data access. Classes are usually small (at least mine are) so direct access isn't going to be an issue within that scope. Most accessors are simple assignments for the setter and retrieval for the getter, so there's little value in using them internally.

But there are common circumstances where self-encapsulation is worth the effort. If there is logic in the setter, then it's wise to consider it for any internal updates too. Another circumstance is when the class is part of an inheritance structure, in which case the accessors provide valuable hook points for subclasses to override behavior.

So my usual first move is to use direct access to fields, but refactor using Self Encapsulate Field should circumstances demand it. Often the forces that lead me to consider self-encapsulation I can resolve by extracting a new class.

Further Reading

Kent Beck discusses these trade-offs under the names Direct Access and Indirect Access in both Implementation Patterns and Smalltalk Best Practice Patterns


Ian Cartwright, Matteo Vaccari, and Philip Duldig commented on drafts of this post
Translations: Chinese


13 February 2017

In programming, the fundamental notion of an object is the bundling of data and behavior. This provides a common data context when writing a set of related functions. It also provides an interface to manipulating the data that allows the object to control access to that data, making it easy to support derived data and prevent invalid modifications of data. Many languages provide explicit syntax to define classes, which act as definitions for objects. But if you have a language with first-class functions and closures, you can use these constructs to create objects using the Function As Object pattern (originally described by Eugene Wallingford).

Here is an example of a simplistic person object, done using the function-as-object style in JavaScript. [1]

function createPerson(name) {
  let birthday;
  return {
    name: () => name,
    setName: (aString) => name = aString,
    birthday: () => birthday,
    setBirthday: (aLocalDate) => birthday = aLocalDate,
    age: age,
    canTrust: canTrust,
  function age() {
    return birthday.until(, ChronoUnit.YEARS);
  function canTrust() {
    return age() <= 30;

The outer form of a function-as-object is a function, which is called as a constructor function. The result of the call is, in essence, a hashmap of functions [2] which acts as a method selector. This map captures the state of any variables in the function in a closure, allowing the data to persist beyond a single function invocation. This result hashmap can be treated like a classical object.

const kent = createPerson("kent");
const youngEnoughToTrust = kent.canTrust();

Looking at the function-as-object from a classical OO point of view:

  • the fields of the object are represented by the parameters to the constructor function (name)together with the local variables (birthday).
  • the methods of the object are the functions nested within the constructor function. Like object methods they can freely call each other and manipulate the data in these locally scoped variables (fields)
  • Nothing outside the constructor function can access the variables, preserving data encapsulation.
  • The public methods of the object are those functions that are present in the result hashmap.
  • Any functions nested inside the constructor function but not present in the result hashmap are private methods
  • The names of the public methods are the keys of the result hashmap, not the names of the functions within the constructor function. I prefer to keep the keys and function names the same to avoid confusion (although it can be handy to alias functions if needed). [3]

A common alternative implementation of this pattern is to return a function as the method selector rather than the hashmap which is the natural method selector in JavaScript. To use a function as the method selector, I'd return a function whose first argument is the name of the method to invoke. The function body then switches on that value (see Wallingford for more on this).

The function-as-object approach has been around for a long time, I've seen it described in lisp many times, and it's been widely used in JavaScript (until ES6, JavaScript had a very limited notion of classes). It's often used as an argument that a specific syntax for classes isn't necessary, which is the equivalent of object-aficionados arguing that you don't need first class functions when you can write a class with a single "call" method. As a consequence many people in the JavaScript world argue against using the ES6 class syntax. Personally, I like having both first class functions and first class classes, and prefer ES6's class syntax.

Further Reading

Eugene Wallingford coined the name "Function as Object" in his 1999 pattern language "Envoy". His paper is worth reading for more details on this, including using a function as the method selector and delegation to support some notion of inheritance. The examples in the paper use Scheme.


Chris Ford, Fred George, James Shore, Kevin Yeung, Lucas Lego, Matteo Vaccari, Rob Miles, and Eugene Wallingford commented on drafts of this post


1: For date handling I'm using js-joda, a port of the Joda-Time library that cleaned up the appalling mess that was Java's date and time handling. I'm glad joda-js is repeating the service of bringing sanity to date and time handling.

2: In JavaScript terminology it's called an object, although it is a JavaScript object, not the classical object that we're trying to create. I'll thus refer to it as a hashmap, to try and reduce the confusion.

3: In ES6 I can use shorthand property names to remove the duplication by replacing "age: age," with "age,".

Translations: Chinese · Korean


25 January 2017

Synthetic monitoring (also called semantic monitoring [1]) runs a subset of an application's automated tests against the live production system on a regular basis. The results are pushed into the monitoring service, which triggers alerts in case of failures. This technique combines automated testing with monitoring in order to detect failing business requirements in production.

In the age of small independent services and frequent deployments it's very difficult to test pre-production with the exact same combination of versions as they will later exist in production. One way to mitigate this problem is to extend testability from pre-production into production environments - the idea behind QA in production. Doing this shifts the mindset from a focus on Mean-Time-Between-Failures (MTBF) towards a focus on Mean-Time-To-Recovery (MTTR).

A technique for this is synthetic monitoring, which we used at a client who is a digital marketplace for cars with millions of classifieds across a dozen countries. They have close to a hundred services in production, each deployed multiple times a day. Tests are run in a ContinuousDelivery pipeline before the service is deployed to production. The dependencies for the integration tests do not use TestDoubles, instead the tests run against components in production.

Here is an example of these tests that's well suited for synthetic monitoring. It impersonates a user adding a classified to her list of favourites. The steps she takes are as follows:

In order to exclude test requests from analytics we add a parameter (such as excluderequests=true) to the URL. The parameter is handed over transitively to all downstream services, each of which suppresses analytics and third party scripts when it is set to true.

We could use the excluderequests parameter to mark the data as synthetic in the backend datastores. In our case this isn't relevant since we re-use the same user account and clean out its state at the beginning of the test. The downside is that we cannot run this test concurrently. Alternatively, we could create a new user account for each test run. To make the test users easily identifiable these accounts would have a specific pre or postfix in the email address. Another option would be to have a custom HTTP header that would be sent in every request to identify it as a test, though this is more common for APIs.

Our tests run with the Selenium webdriver and are executed with PhantomJS every 5 minutes against the service in production. The test results are fed into the monitoring system and displayed on the team's dashboard. Depending on the importance of the tested feature, failures can also trigger alerts for on-call duties.

A selection of Broad Stack Tests at the top of the Test Pyramid are well suited to use for synthetic monitoring. These would be UI tests, User Journey Tests, User Acceptance tests or End-to-End tests for web applications; or Consumer-Driven Contract tests (CDCs) for APIs. An alternative to running a suite of UI tests — for example in the context of batch processing jobs — would be to feed a synthetic transaction into the system and assert on its desired final state such as a database entry, a message on a queue or a file in a directory.


Thanks to Henry Lawson for his feedback.

And a special thanks to Martin Fowler for his support, suggestions and time spent helping us improve this Bliki.


1: Ryan Murray coined the term "semantic monitoring" and it appeared on the ThoughtWorks Technology Radar in late 2012. However "synthetic monitoring" seems to be the more widely used term, and usefully builds on the notion of synthetic transactions.

Translations: Chinese