test categories


Integration tests determine if independently developed units of software work correctly when they are connected to each other. The term has become blurred even by the diffuse standards of the software industry, so I've been wary of using it in my writing. In particular, many people assume integration tests are necessarily broad in scope, while they can be more effectively done with a narrower scope.

As often with these things, it's best to start with a bit of history. When I first learned about integration testing, it was in the 1980's and the waterfall was the dominant influence of software development thinking. In a larger project, we would have a design phase that would specify the interface and behavior of the various modules in the system. Modules would then be assigned to developers to program. It was not unusual for one programmer to be responsible for a single module, but this would be big enough that it could take months to build it. All this work was done in isolation, and when the programmer believed it was finished they would hand it over to QA for testing.

The first part of testing would be unit testing, which would test that module on its own, against the specification that had been done in the design phase. Once that was complete, we then move to integration testing, where the various modules are combined together, either into the entire system, or into significant sub-systems.

The point of integration testing, as the name suggests, is to test whether many separately developed modules work together as expected. It was performed by activating many modules and running higher level tests against all of them to ensure they operated together. These modules could parts of a single executable, or separate.

Looking at it from a more 2010s perspective, these conflated two different things:

These two things were easy to conflate, after all how else would you test the frobile and twibbler modules without activating them both into a single environment and running tests that exercised both modules?

The 2010s perspective offers another alternative, one that was rarely considered in the 1980s. In the alternative, we test the integration of the frobile and twibbler modules by exercising the portion of the code in frobile that interacts with twibbler, executing it against a TestDouble of twibbler. Providing the test double is a faithful double of twibber, we can then test all the interaction behavior of twibbler without activating a full twibbler instance. This may not be a big deal if they are separate modules of a monolithic application, but is a big deal if twibbler is a separate service, which requires its own build tools, environments, and network connections. For services, such tests may run against an in-process test double, or against an over-the-wire double, using something like mountebank

An obvious catch with integration testing against a double is whether that double is truly faithful. But we can test that separately using ContractTests.

Using this combination of using narrow integration tests and contract tests, I can be confident of integrating against an external service without ever running tests against a real instance of that service, which greatly eases my build process. Teams that do this, may still do some form of end-to-end system test with all real services, but if so it's only a final smoke test with a very limited range of paths tested. It also helps to have a mature QA in Production capability, and if that is mature enough, there may be no end-to-end system testing done at all.

The problem is that we have (at least) two different notions of what constitutes an integration test.

narrow integration tests

broad integration tests

And there is a large population of software developers for whom “integration test” only means “broad integration tests”, leading to plenty of confusion when they run into people who use the narrow approach.

If your only integration tests are broad ones, you should consider exploring the narrow style, as it's likely to significantly improve your testing speed, ease of use, and resiliency. Since narrow integration tests are limited in scope, they often run very fast, so can run in early stages of a DeploymentPipeline, providing faster feedback should they go red.

All this is why I'm wary with “integration test”. When I read it, I look for more context so I know which kind the author really means. [1] If I talk about broad integration tests, I prefer to use “system test” or “end-to-end test”. I don’t have any better name for narrow integration tests, so I do use that (but with “narrow” to help signal to the reader the nature of these tests).


Birgitta Böckeler, Brian Oxley, Dave Rice, Deepti Mittal, Jonny Leroy, Kief Morris, Raimund Klein, Rogerio Chaves, and Tiago Griffo discussed drafts of this post on our internal mailing list.


1: Although I prefer to focus the definition on the interaction of separately built modules, I do occasionally see “integration test” used to mean anything bigger than a unit test. And for some users of solitary unit tests, I’ve seen them describe sociable unit tests as “integration tests”.

if you found this article useful, please share it. I appreciate the feedback and encouragement


big data


I remember in my teens being told of the wonderful things Artificial Intelligence (AI) would do in the next few years. Now several decades later, some of these seem to be happening. The most recent triumph was of computers teaching each other to play Go by playing against each other, rapidly becoming more proficient than any human, with strategies human experts could barely comprehend. It's natural to wonder what will happen over the next few years, will computers soon have greater intelligence than humanity? (Given some recent election results, that may not be too hard a bar to cross.)

But as I hear of these, I recall Pablo Picasso's comment about computers many decades ago: "Computers are useless. They can only give you answers". The kind of reasoning that techniques such as Machine Learning can result in are truly impressive in their results, and will be useful to us as users and developers of software. But answers, while useful, aren't always the whole picture. I learned this in my early days of school - just providing the answer to a math problem would only get me a couple of marks, to get the full score I had to show how I got it. The reasoning that got to the answer was more valuable than the result itself. That's one of the limitations of the self-taught Go AIs. While they can win, they cannot explain their strategies.

Given this world, one of the big challenges I see for AI is that while we may have figured out Machine Learning in order to teach them to get answers, we haven't got systems that can do Machine Justification for their answers. As AIs make more judgments for us, we'll increasingly run into situations where the answer isn't enough. An AI might be trained in such a way to rule on legal cases, but could we accept a judgment where the AI cannot explain its reasoning?

Given this it seems likely that we will need a new class of "programmer' in the future, one whose job is to figure out why AIs get the answer they do, to deduce the reasoning underlying the AIs skills. We could see many fields where AIs make opaque judgments that we can see are good, but need another approach for us to really learn the theory that underlies their decisions.

This problem is particularly acute since we've discovered that it's awfully easy for these machines to learn undesirable behaviors from their training data, such as discriminating against racial minorities when judging credit ratings.

Like many, I see much of the opportunity of computers is in collaboration with humans. Good use of computers is understanding where the computer is strong (rapidly doing constrained work) and where humans are better, and using a mix. Computers are, at their most intellectual, a tool for the mind. In programming I'm happy to lean on the compiler to help me find errors or suggest alternatives, a practice which I was scolded for as a young programmer. That boundary between where the two are strongest is fluid, and one of the fascinations of the future is how we can best take advantage of its movement.

Further Reading

MIT Technology Review looks at the broad topic of explainability for AI.

Some articles in the dangers of machine learning and undesirable bias from The Atlantic, NPR, and Tech Republic


Brandon Byars, Chris Ford, Christoph Windheuser, Danilo Sato, Dave Elliman, Ian Cartwright, Kent Rahman, Saleem Siddiqui, Sallie Walecka, Tito Sarrionandia, and Vishal Bardoloi discussed drafts of this post on our internal mailing lists.
Translations: Portuguese
if you found this article useful, please share it. I appreciate the feedback and encouragement




Data encapsulation is a central tenet in object-oriented style. This says that the fields of an object should not be exposed publicly, instead all access from outside the object should be via accessor methods (getters and setters). There are languages that allow publicly accessible fields, but we usually caution programmers not to do this. Self-encapsulation goes a step further, indicating that all internal access to a data field should also go through accessor methods as well. Only the accessor methods should touch the data value itself. If the data field isn't exposed to the outside, this will mean adding additional private accessors.

Here's an example of a reasonably encapsulated java class

class Charge…

  private int units;
  private double rate;

  public Charge(int units, double rate) {
    this.units = units;
    this.rate = rate;
  public int getUnits() { return units; }
  public Money getAmount() { return Money.usd(units * rate); }

Both fields are immutable. The units field is exposed to clients of the class via a getter, but the rate field is only used internally, so doesn't need a getter.

Here is a version using self-encapsulation.

class ChargeSE…

  private int units;
  private double rate;

  public ChargeSE(int units, double rate) {
    this.units = units;
    this.rate = rate;
  public int getUnits()    { return units; }
  private double getRate() { return rate; }
  public Money getAmount() { return Money.usd(getUnits() * getRate()); }

Self encapsulaton means that getAmount needs to access both fields through getters. This also means I have to add a getter for rate, which I should make private.

Encapsulating mutable data is generally a good idea. Update functions can contain code to execute validations and consequential logic. By restricting access through functions, we can support the UniformAccessPrinciple, allowing us to hide which data is computed and which is stored. These accessors allow us to modify the data structures while retaining the same public interface. Different languages differ in details of what is "outside" for an object by various kinds of AccessModifier, but most environments support data encapsulation to some degree.

I've come across a few organizations that mandated self-encapsulation, and whether to use it or not was a regular topic of debate since the 90's. Its advocates said that encapsulation was such a benefit, that you wanted to incorporate it to internal access too. Critics argued that it was unnecessary ceremony leading to unnecessary code that obscured what was going on.

My view on this is that most of the time there's little value in self-encapsulation. The value of encapsulation is proportional to the scope of the data access. Classes are usually small (at least mine are) so direct access isn't going to be an issue within that scope. Most accessors are simple assignments for the setter and retrieval for the getter, so there's little value in using them internally.

But there are common circumstances where self-encapsulation is worth the effort. If there is logic in the setter, then it's wise to consider it for any internal updates too. Another circumstance is when the class is part of an inheritance structure, in which case the accessors provide valuable hook points for subclasses to override behavior.

So my usual first move is to use direct access to fields, but refactor using Self Encapsulate Field should circumstances demand it. Often the forces that lead me to consider self-encapsulation I can resolve by extracting a new class.

Further Reading

Kent Beck discusses these trade-offs under the names Direct Access and Indirect Access in both Implementation Patterns and Smalltalk Best Practice Patterns


Ian Cartwright, Matteo Vaccari, and Philip Duldig commented on drafts of this post
Translations: Chinese
if you found this article useful, please share it. I appreciate the feedback and encouragement


encapsulation · language feature · object collaboration design


In programming, the fundamental notion of an object is the bundling of data and behavior. This provides a common data context when writing a set of related functions. It also provides an interface to manipulating the data that allows the object to control access to that data, making it easy to support derived data and prevent invalid modifications of data. Many languages provide explicit syntax to define classes, which act as definitions for objects. But if you have a language with first-class functions and closures, you can use these constructs to create objects using the Function As Object pattern (originally described by Eugene Wallingford).

Here is an example of a simplistic person object, done using the function-as-object style in JavaScript. [1]

function createPerson(name) {
  let birthday;
  return {
    name: () => name,
    setName: (aString) => name = aString,
    birthday: () => birthday,
    setBirthday: (aLocalDate) => birthday = aLocalDate,
    age: age,
    canTrust: canTrust,
  function age() {
    return birthday.until(, ChronoUnit.YEARS);
  function canTrust() {
    return age() <= 30;

The outer form of a function-as-object is a function, which is called as a constructor function. The result of the call is, in essence, a hashmap of functions [2] which acts as a method selector. This map captures the state of any variables in the function in a closure, allowing the data to persist beyond a single function invocation. This result hashmap can be treated like a classical object.

const kent = createPerson("kent");
const youngEnoughToTrust = kent.canTrust();

Looking at the function-as-object from a classical OO point of view:

A common alternative implementation of this pattern is to return a function as the method selector rather than the hashmap which is the natural method selector in JavaScript. To use a function as the method selector, I'd return a function whose first argument is the name of the method to invoke. The function body then switches on that value (see Wallingford for more on this).

The function-as-object approach has been around for a long time, I've seen it described in lisp many times, and it's been widely used in JavaScript (until ES6, JavaScript had a very limited notion of classes). It's often used as an argument that a specific syntax for classes isn't necessary, which is the equivalent of object-aficionados arguing that you don't need first class functions when you can write a class with a single "call" method. As a consequence many people in the JavaScript world argue against using the ES6 class syntax. Personally, I like having both first class functions and first class classes, and prefer ES6's class syntax.

Further Reading

Eugene Wallingford coined the name "Function as Object" in his 1999 pattern language "Envoy". His paper is worth reading for more details on this, including using a function as the method selector and delegation to support some notion of inheritance. The examples in the paper use Scheme.


Chris Ford, Fred George, James Shore, Kevin Yeung, Lucas Lego, Matteo Vaccari, Rob Miles, and Eugene Wallingford commented on drafts of this post


1: For date handling I'm using js-joda, a port of the Joda-Time library that cleaned up the appalling mess that was Java's date and time handling. I'm glad joda-js is repeating the service of bringing sanity to date and time handling.

2: In JavaScript terminology it's called an object, although it is a JavaScript object, not the classical object that we're trying to create. I'll thus refer to it as a hashmap, to try and reduce the confusion.

3: In ES6 I can use shorthand property names to remove the duplication by replacing "age: age," with "age,".

Translations: Chinese · Korean
if you found this article useful, please share it. I appreciate the feedback and encouragement


25 January 2017

continuous delivery · testing


Synthetic monitoring (also called semantic monitoring [1]) runs a subset of an application's automated tests against the live production system on a regular basis. The results are pushed into the monitoring service, which triggers alerts in case of failures. This technique combines automated testing with monitoring in order to detect failing business requirements in production.

In the age of small independent services and frequent deployments it's very difficult to test pre-production with the exact same combination of versions as they will later exist in production. One way to mitigate this problem is to extend testability from pre-production into production environments - the idea behind QA in production. Doing this shifts the mindset from a focus on Mean-Time-Between-Failures (MTBF) towards a focus on Mean-Time-To-Recovery (MTTR).

A technique for this is synthetic monitoring, which we used at a client who is a digital marketplace for cars with millions of classifieds across a dozen countries. They have close to a hundred services in production, each deployed multiple times a day. Tests are run in a ContinuousDelivery pipeline before the service is deployed to production. The dependencies for the integration tests do not use TestDoubles, instead the tests run against components in production.

Here is an example of these tests that's well suited for synthetic monitoring. It impersonates a user adding a classified to her list of favourites. The steps she takes are as follows:

In order to exclude test requests from analytics we add a parameter (such as excluderequests=true) to the URL. The parameter is handed over transitively to all downstream services, each of which suppresses analytics and third party scripts when it is set to true.

We could use the excluderequests parameter to mark the data as synthetic in the backend datastores. In our case this isn't relevant since we re-use the same user account and clean out its state at the beginning of the test. The downside is that we cannot run this test concurrently. Alternatively, we could create a new user account for each test run. To make the test users easily identifiable these accounts would have a specific pre or postfix in the email address. Another option would be to have a custom HTTP header that would be sent in every request to identify it as a test, though this is more common for APIs.

Our tests run with the Selenium webdriver and are executed with PhantomJS every 5 minutes against the service in production. The test results are fed into the monitoring system and displayed on the team's dashboard. Depending on the importance of the tested feature, failures can also trigger alerts for on-call duties.

A selection of Broad Stack Tests at the top of the Test Pyramid are well suited to use for synthetic monitoring. These would be UI tests, User Journey Tests, User Acceptance tests or End-to-End tests for web applications; or Consumer-Driven Contract tests (CDCs) for APIs. An alternative to running a suite of UI tests — for example in the context of batch processing jobs — would be to feed a synthetic transaction into the system and assert on its desired final state such as a database entry, a message on a queue or a file in a directory.


Thanks to Henry Lawson for his feedback.

And a special thanks to Martin Fowler for his support, suggestions and time spent helping us improve this Bliki.


1: Ryan Murray coined the term "semantic monitoring" and it appeared on the ThoughtWorks Technology Radar in late 2012. However "synthetic monitoring" seems to be the more widely used term, and usefully builds on the notion of synthetic transactions.

Translations: Chinese
if you found this article useful, please share it. I appreciate the feedback and encouragement


certification · continuous delivery


Continuous Integration is a popular technique in software development. At conferences many developers talk about how they use it, and Continuous Integration tools are common in most development organizations. But we all know that any decent technique needs a certification program — and fortunately one does exist. Developed by one of the foremost experts in continuous delivery and devops, it’s known for being remarkably rapid to administer, yet very insightful for its results. Although it’s quite mature, it isn’t as well known as it should be, so as a fan of the technique I think it’s important for me to share this certification program with my readers. Are you ready to be certified for Continuous Integration? And how will you deal with the shocking truth that taking the test will reveal?

By now my regular readers are wondering if they’ve come across a parody post [1], and yes I am having a little fun with my opening teaser. But like any good joke there’s an important kernel of truth buried in it. There is a remarkably good test for proper Continuous Integration that was created by Jez Humble - and he certainly is a leading expert in ContinuousDelivery. It’s also a rapid test, he often administers it to his audience during his talks. The only problem is that I’ve never heard him refer to it as a certification test - which just shows his lack of vision for money-making schemes.

He usually begins the certification process by asking his audience to raise their hands if they do Continuous Integration. Usually most of the audience raise their hands.

He then asks them to keep their hands up if everyone on their team commits and pushes to a shared mainline (usually shared master in git) at least daily.

Over half the hands go down.

He then asks them to keep their hands up if each such commit causes an automated build and test. Half the remaining hands are lowered.

Finally he asks if, when the build fails, it’s usually back to green within ten minutes. [2]

With that last question only a few hands remain. Those are the people who pass his certification test.

It’s a simple set of questions, but it gets to the core of what Continuous Integration is about. The whole idea is that nobody is working on a code base that deviates significantly from anyone else’s. Continuous Integration means the team knows what the current state of the code truly is, we avoid big risky merges, and people can refactor as much as they need to.

The reason so many people raise their hands at the beginning is the common view that Continuous Integration means running some “Continuous Integration Server” against their feature branches. But Continuous Integration — as it was originally described and named by Kent Beck as part of ExtremeProgramming — has nothing to do with tools. At the beginning it was a human workflow and Jim Shore made an excellent argument that it should be that. The idea of running a daemon process against a source code repository came later, and while it is helpful, it’s only Continuous Integration if it’s run on a shared mainline that people commit to every day. Running such a process otherwise, such as on every FeatureBranch, is CI theater that debases the name [3], yielding a workflow that doesn't give you the benefits that make the whole thing worth the effort.

Further Reading

For more details on Continuous Integration, see my main article, while written in 2006 it's still a solid summary and definition of the technique. Jez explains why Continuous Integration is a foundation for Continuous Delivery. He states the three questions in the FAQ on that page. Paul Duvall wrote the definitive book on Continuous Integration. Watch Jez administer the certification test at GOTO Chicago in 2014 (sadly there was no camera on the audience).


All credit for the three questions go to Jez, whose talks I've always enjoyed.


1: In general, I'm not a fan of software certification schemes, as they usually fail the CertificationCompetenceCorrelation

2: For this step, "green" counts as passing the commit build, typically compilation and unit tests. While we usually expect a full DeploymentPipeline to be run for release to production, a repository should be fine for developers to work on after the commit build is green. You should have a commit build that takes no more than ten minutes, so quickly fixing it and re-running the commit build works if the fix is easy. If you can't fix and get a green commit build within ten minutes, then you should revert to the last green build.

3: The problem of CI theater leads some people to use the name Trunk-Based Development, arguing that SemanticDiffusion has rendered the term “Continuous Integration” useless. While I understand their view, I believe that we shouldn’t give in to semantic diffusion, instead we need to keep working at re-explaining the proper meaning of Continuous Integration, just as we should with other terms under this kind of semantic assault (such as “agile” and “refactoring”). After all Kent was quite clear in his definition of the term, and using another diminishes the important role he had in popularizing the technique via the Extreme Programming community.

Translations: Chinese
if you found this article useful, please share it. I appreciate the feedback and encouragement


metrics · clean code


During my career, I've heard many arguments about how long a function should be. This is a proxy for the more important question - when should we enclose code in its own function? Some of these guidelines were based on length, such as functions should be no larger than fit on a screen [1]. Some were based on reuse - any code used more than once should be put in its own function, but code only used once should be left inline. The argument that makes most sense to me, however, is the separation between intention and implementation. If you have to spend effort into looking at a fragment of code to figure out what it's doing, then you should extract it into a function and name the function after that “what”. That way when you read it again, the purpose of the function leaps right out at you, and most of the time you won't need to care about how the function fulfills its purpose - which is the body of the function.

Once I accepted this principle, I developed a habit of writing very small functions - typically only a few lines long [2]. Any function more than half-a-dozen lines of code starts to smell to me, and it's not unusual for me to have functions that are a single line of code [3]. The fact that size isn't important was brought home to me by an example that Kent Beck showed me from the original Smalltalk system. Smalltalk in those days ran on black-and-white systems. If you wanted to highlight some text or graphics, you would reverse the video. Smalltalk's graphics class had a method for this called 'highlight', whose implementation was just a call to the method 'reverse' [4]. The name of the method was longer than its implementation - but that didn't matter because there was a big distance between the intention of the code and its implementation.

Some people are concerned about short functions because they are worried about the performance cost of a function call. When I was young, that was occasionally a factor, but that's very rare now. Optimizing compilers often work better with shorter functions which can be cached more easily. As ever, the general guidelines on performance optimization are what counts. Sometimes inlining the function later is what you'll need to do, but often smaller functions suggest other ways to speed things up. I remember people objecting to having an isEmpty method for a list when the common idiom is to use aList.length == 0. But here using the intention-revealing name on a function may also support better performance if it's faster to figure out if a collection is empty than to determine its length.

Small functions like this only work if the names are good, so you need to pay good attention to naming. This takes practice, but once you get good at it, this approach can make code remarkably self-documenting. Larger scale functions can read like a story, and the reader can choose which functions to dive into for more detail as she needs it.


Brandon Byars, Karthik Krishnan, Kevin Yeung, Luciano Ramalho, Pat Kua, Rebecca Parsons, Serge Gebhardt, Srikanth Venugopalan, and Steven Lowe discussed drafts of this post on our internal mailing list.

Christian Pekeler reminded me that nested functions don't fit my sizing observations.


1: Or in my first programming job: two pages of line printer paper - around 130 lines of Fortran IV

2: Many languages allow you to use functions to contain other functions. This is often used as a scope reduction mechanism, such as using the Function as Object pattern to implement a class. Such functions are naturally much larger.

3: Length of my functions

Recently I got curious about function length in the toolchain that builds this website. It's mostly Ruby and runs to about 15 KLOC. Here's a cumulative frequency plot for the method body lengths

As you see there's lots of small methods there - half of the methods in my codebase are two lines or less. (lines here are non-comment, non-blank, and excluding the def and end lines.)

Here's the data in a crude tabular form (I'm feeling too lazy to turn it into proper HTML tables).

              lines.freq lines.cumfreq lines.cumrelfreq
[1,2)          875           875        0.4498715
[2,3)          264          1139        0.5856041
[3,4)          195          1334        0.6858612
[4,5)          120          1454        0.7475578
[5,6)          116          1570        0.8071979
[6,7)           69          1639        0.8426735
[7,8)           75          1714        0.8812339
[8,9)           46          1760        0.9048843
[9,10)          50          1810        0.9305913
[10,15)         98          1908        0.9809769
[15,20)         24          1932        0.9933162
[20,50)         12          1944        0.9994859

4: The example is in Kent's excellent Smalltalk Best Practice Patterns in Intention Revealing Message

Translations: Chinese
if you found this article useful, please share it. I appreciate the feedback and encouragement