29 April 2020
Software development teams find life can be much easier if they integrate their work as often as they can. They also find it valuable to release frequently into production. But teams don't want to expose half-developed features to their users. A useful technique to deal with this tension is to build all the back-end code, integrate, but don't build the user-interface. The feature can be integrated and tested, but the UI is held back until the end until, like a keystone, it's added to complete the feature, revealing it to the users.
A simple example of this technique might be to give a customer the option of a rush order. Such an order needs to be priced, depending on where the customer lives and what delivery companies operate there. The nature of the goods involved affects the picking approach used in the warehouse. Certain customers may qualify to have rush orders available to them, which may also depend on the delivery location, the time of year, and the kind of goods ordered.
All in all that's a fair bit of business logic, particularly since it will involve gnarly integration with various warehousing, catalog, and customer service systems. Doing this could take several weeks, while other features, need to be released every few days. But as far as the user is concerned, a rush order is just a check-box on the order form.
To build this using the check-box as the keystone, the team does development work on the underlying business logic and interfaces to internal systems over the course of several production releases. The user is unaware of all this latent code. Only with the last step does the keystone check-box need to be made visible, which can be done in a relatively short time. This way all latent code can be integrated and be part of the system going into production, reducing the problems that come with a long-lived feature branch.
The latent code does need to be tested to the same degree of confidence that it would be if it were active. This can be done providing the architecture of the system is setup so that most testing isn't done through the user interface. Unit Tests and other lower layers of the Test Pyramid should be easy to run this way. Even Broad Stack Tests can be run providing there is a mechanism to make them Subcutaneous Tests. In some cases there will a significant amount of behavior within the UI itself, but this can also be tested if the design allows the visible UI to be a Humble Object.
Not all applications are built in such a way that they can be extensively tested in a subcutaneous manner - but the effort required to do this is worthwhile even without the capability to use a keystone. Tests running through the UI are always more trouble to setup, even with the best tools to automate the process. Moving more tests to subcutaneous and lower level tests, especially unit tests, can dramatically speed up Deployment Pipelines and enable Continuous Delivery.
Of course, most UIs will be more than a check-box, although often they aren't that much more work to keystone. In a web app, a complex feature will often be an independent web page, that can be built and tested in full, and the keystone is merely a link. A desktop may have several screens where the keystone is the menu-item to make them visible.
That said, there are cases when the UI can't be packaged into a simple keystone. When that's the case then it's time to use Feature Toggles. Even in this case, however, thinking of a keystone can be useful by ensuring that the feature toggle only applies to the UI. This avoids scattering lots of toggle points through the back end code, reduces the complexity of applying the toggle, allows the use of simple toggle mechanisms, and makes it easier to remove when the time comes.
There is a general danger with developing a UI last, in that the back-end code may be designed in a way that doesn't work with the UI once it's built, or the UI isn't given the attention it needs until late, leading to a lack of iteration and a poor user experience. For those reasons a keystone approach works best within an overall approach that encourages building a product through thin vertical slices that lead to releasing small but fully working features rapidly.
I've used the example of a user-interface here, but of course the same approach can be used with any other interface, such as an API. By building the consumer's interface last, and keeping it simple, we can build and integrate even large features in small chunks.
Dark Launching is a variation where the new feature is called once its built, but no results are shown to the user. This is done to measure the impact on the back-end systems, which is useful for some changes. Once all is good, we can add the keystone.
I first came across the metaphor of a keystone for this technique in the second edition of Kent Beck's Extreme Programming Explained. Pete Hodgson, Brandon Duff, and Stefan Smith reminded me that I'd forgotten this.
Dave Farley, Paul Hammant, and Pete Hodgson commented on drafts on this post.
11 February 2020
Imagine a team writing software for a shopping website. If we look at the team's output, we might consider how many new features they produced in the last quarter, or a cross-functional measure such as a reduction in page load time. An outcome measure, however, would consider measure increased sales revenue, or reduced number of support calls for the product. Focusing on outcomes, rather than output, favors building features that do more to improve the effectiveness of the software's users and customers.
As with any professional activity, those of us involved in software development want to learn what makes us more effective. This is true of an individual developer trying to improve her own performance, for managers looking to improve teams within an organization, or a maven like me trying to raise the game of the entire industry. One of the things that makes this difficult is that there's no clear way to measure the productivity of a software team. And this measurement question gets further complicated by whether we base effectiveness on output or outcome.
I've always been of the opinion that outcome is what we should concentrate on. If a team delivers lots of functionality - whether we measure it in lines of code, function points, or stories - that functionality doesn't matter if it doesn't help the user improve their activity. Lots of unused features are wasted effort, indeed worse than that they bloat the code base making it harder to add new features in the future. Such a software development team needs to care about the usefulness of the new functionality, they improve as they deliver less features, but of greater utility.
One argument I've heard against using outcome-based observations is that it's harder to come up with repeatable measures for outcomes than it is for output. I find this point difficult to fathom. Measuring pure output for software is famously difficult. Lines of code are a useless measure even if they weren't so easily gamed. There's poor replicability with Function Point or Story Points - different people will give the same things different point scores. Compared to this, we are very good at measuring financial outcomes. Of course, many outcome observations are more tricky to make - consider customer satisfaction - but I don't see any of them as more difficult than software functionality.
Just calling something an “outcome”, of course, doesn't make something the right thing to focus on, and there is certainly a skill to picking the right outcomes to observe. One handy notion is that of Seiden, who says that an outcome should be a change in behavior of a user, employee, or customer that drives a good thing for the organization. He makes a distinction between “outcomes”, which are behavioral changes that are easier to observe, and “impacts” which are broader effects upon the organization. In developing EDGE, Highsmith, Luu, and Robinson advise that outcomes about customer value (reliability of a dishwasher) should be given more weight than outcomes about business value (warranty repair costs).
A consequential concern about using outcome observations is that it's harder to apportion them to a software development team. Consider a customer team that uses software to help them track the quality of goods in their supply chain. If we assess them by how many rejects there are by the final consumer, how much of that is due to the software, how much due the quality control procedures developed by quality analysts, and how much due to a separate initiative to improve the quality of raw materials? This difficulty of apportionment is a huge hurdle if we want to compare different software teams, perhaps in order to judge whether using Clojure has helped teams be more effective. Similarly there is the case that the developers work well and deliver excellent and valuable software to track quality, but the quality control procedures are no good. Consequently rejects don't go down and the initiative is seen as a failure, despite the developers doing a great job on their part.
But the problems of apportionment shouldn't be taken as a reason to observe the wrong thing. The common phrase says "you get what you measure", in this case it's more like "you get what you try to measure". If you focus appraisal of success on output, then everyone is thinking about how to increase the output. So even if it's tricky to determine how a team's work affects outcome, the fact that people are instead thinking about outcomes and how to improve them is worth more than any effort to compare teams' proficiency in producing the wrong things.
Seiden provides a nice framework for thinking of outcomes, one that's informed by experiences with non-profits who have a similarly tricky job of evaluating the impact of their work.
My colleagues developed EDGE as an operating model for transforming businesses to work in the digital world. Focusing on outcomes is a core part of their philosophy.
Focusing on outcomes naturally leads to favoring Outcome Oriented teams.
My fellow pioneers in the early days of Extreme Programming were very aware of the faults of assessing software development in terms of output. I remember Ron Jeffries and I arguing at an early agile conference workshop that any measures of a team's effectiveness should focus on outcome rather than output - although we did not use those words yet. That thinking is also reflected in my post Cannot Measure Productivity.
I recall starting to hear my colleagues at ThoughtWorks talking about a distinction between outcome and output appearing in the 2000s, leading Daniel Terhorst-North to suggest that outcome over features should be a fifth agile value. This favor to outcomes is a regular theme in ThoughtWorks-birthed books such as Lean Enterprise, EDGE, and the Digital Transformation Game Plan.
Alexander Steinhart, Alexandra Mogul, Andy Birds, Dale Peakall, Dean Eyre, Gabriel Sixel, Jeff Mangan, Job Rwebembera, Kief Morris, Linus Karsai, Mariela Barzallo, Peter Gillard-Moss, Steven Wilhelm, Vanessa Towers, Vikrant Kardam, and Xiao Ran discussed drafts of this post on our internal mailing list. Peter Gillard-Moss led me to the Seiden book and other work from the non-profit world.
18 November 2019
Exploratory testing is a style of testing that emphasizes a rapid cycle of learning, test design, and test execution. Rather than trying to verify that the software conforms to a pre-written test script, exploratory testing explores the characteristics of the software, raising discoveries that will then be classified as reasonable behavior or failures.
The exploratory testing mindset is a contrast to that of scripted testing. In scripted testing, test designers create a script of tests, where each manipulation of the software is written down, together with the expected behavior of the software. These scripts are executed separately, usually many times, and usually by different actors than those who wrote them. If any test demonstrates behavior that doesn't match the expected behavior designed by the test, then we consider this a failure.
For a long time scripted tests were usually executed by testers, and you'd see lots of relatively junior folks in cubicles clicking through screens following the script and checking the result. In large part due to the influence of communities like Extreme Programming, there's been a shift to automating scripted testing. This allows the tests to be executed faster, and eliminates the human error involved in evaluating the expected behavior. I've long been a firm advocate of automated testing like this, and have seen great success with its use drastically reducing bugs.
But even the most determined automated testers realize that there are fundamental limitations with the technique, which are limitations of any form of scripted testing. Scripted testing can only verify what is in the script, catching only conditions that are known about. Such tests can be a fine net that catches any bugs that try to get through it, but how do we know that the net covers all it ought to?
Exploratory testing seeks to test the boundaries of the net, finding new behaviors that aren't in any of the scripts. Often it will find new failures that can be added to the scripts, sometimes it exposes behaviors that are benign, even welcome, but not thought of before.
Exploratory testing is a much more fluid and informal process than scripted testing, but it still requires discipline to be done well. A good way to do this is to carry out exploratory testing in time-boxed sessions. These sessions focus on a particular aspect of the software. A charter that identifies the target of the session and what information you hope to find is a fine mechanism to provide this focus.
Such a charter can act as focus, but shouldn't attempt to define details of what will happen in the session. Exploratory testing involves trying things, learning more about what the software does, applying that learning to generate questions and hypotheses, and generating new tests in the moment to gather more information. Often this will spur questions outside the bounds of the charter, that can be explored in later sessions.
Exploratory testing requires skilled and curious testers, who are comfortable with learning about the software and coming up with new test designs during a session. They also need to be observant, on the lookout for any behavior that might seem odd, and worth further investigation. Often, however, they don't have to be full-time testers. Some teams like to have the whole team carry out exploratory testing, perhaps in pairs or in a single mob.
Exploratory testing should be a regular activity occurring throughout the software development process. Sadly it's hard to find any guidelines on how much should be done within a project. I'd suggest starting with a one hour session every couple of weeks and see what kinds of information the sessions unearth. Some teams like to arrange half-an-hour or so of exploratory testing whenever they complete a story.
If you find bugs are getting through to production, that's a sign that there are gaps in the testing regimen. It's worth looking at any bug that escapes to production and thinking about what measures could be taken to either prevent the bug from getting there, or detecting it rapidly when in production. This analysis will help you decide whether you need more exploratory testing. Bear in mind that it will take time to build up the skill to do exploratory testing well, if you haven't done much exploratory testing before.
I would consider it a red flag if a team isn't doing exploratory testing at all - even if their automated testing was excellent. Even the best automated testing is inherently scripted testing - and that alone is not good enough.
Almost all I know about Exploratory Testing comes from Elisabeth Hendrickson's fine book, which is also where I pinched the net metaphor from.
Aida Manna, Alex Fraser, Bharath Kumar Hemachandran, Chris Ford, Claire Sudbery, Daniel Mondria, David Corrales, David Cullen, David Salazar Villegas, Lina Zubyte, and Philip Peter discussed drafts of this article on our internal mailing list.
13 November 2019
In the software world, “waterfall” is commonly used to describe a style of software process, one that contrasts with the ideas of iterative, or agile styles. Like many well-known terms in software it's meaning is ill-defined and origins are obscure - but I find its essential theme is breaking down a large effort into phases based on activity.
It's not clear how the word “waterfall” became so prevalent, but most people base its origin on a paper by Winston Royce, in particular this figure:
Although this paper seems to be universally acknowledged as the source of the notion of waterfall (based on the shape of the downward cascade of tasks), the term “waterfall” never appears in the paper. It's not clear how the name appeared later.
Royce’s paper describes his observations on the software development process of the time (late 60s) and how the usual implementation steps could be improved.  But “waterfall” has gone much further, to be used as a general description of a style of software development. For people like me, who speak at software conferences, it almost always only appears in a derogatory manner - I can’t recall hearing any conference speaker saying anything good about waterfall for many years. However when talking to practitioners in enterprises I do hear of it spoken as a viable, even preferred, development style. Certainly less so now than in the 90s, but more frequently than one might assume by listening to process mavens.
But what exactly is “waterfall”? That’s not an easy question to answer as, like so many things in software, there is no clear definition. In my judgment, there is one common characteristic that dominates any definition folks use for waterfall, and that’s the idea of decomposing effort into phases based on activity.
Let me unpack that phrase. Let’s say I have some software to build, and I think it’s going to take about a year to build it. Few people are going to happily say “go away for a year and tell me when its done”. Instead, most people will want to break down that year into smaller chunks, so they can monitor progress and have confidence that things are on track. The question then is how do we perform this break down?
The waterfall style, as suggested by the Royce sketch, does it by the activity we are doing. So our 1 year project might be broken down into 2 months of analysis, followed by 4 months design, 3 months of coding, and 3 months of testing. The contrast here is to an iterative style, where we would take some high level requirements (build a library management system), and divide them into subsets (search catalog, reserve a book, check-out and return, assess fines). We'd then take one of these subsets and spend a couple of months to build working software to implement that functionality, delivering either into a staging environment or preferably into a live production setting. Having done that with one subset, we'd continue with further subsets.
In this thinking waterfall means “do one activity at a time for all the features” while iterative means “do all activities for one feature at a time”.
If the origin of the word “waterfall” is murky, so is the notion of how this phase-based breakdown originated. My guess is that it’s natural to break down a large task into different activities, especially if you look to activities such as building construction as an inspiration. Each activity requires different skills, so getting all the analysts to complete analysis before you bring in all the coders makes intuitive sense. It seems logical that a misunderstanding of requirements is cheaper to fix before people begin coding - especially considering the state of computers in the late 60s. Finally the same activity-based breakdown can be used as a standard for many projects, while a feature-based breakdown is harder to teach. 
Although it isn’t hard to find people explain why this waterfall thinking isn’t a good idea for software development, I should summarize my primary objections to the waterfall style here. The waterfall style usually has testing and integration as two of the final phases in the cycle, but these are the most difficult to predict elements in a development project. Problems at these stages lead to rework of many steps of earlier phases, and to significant project delays. It's too easy to declare all but the late phases as "done", with much work missing, and thus it's hard to tell if the project is going well. There is no opportunity for early releases before all features are done. All this introduces a great deal of risk to the development effort.
Furthermore, a waterfall approach forces us into a predictive style of planning, it assumes that once you are done with a phase, such as requirements analysis, the resulting deliverable is a stable platform for later phases to base their work on.  In practice the vast majority of software projects find they need to change their requirements significantly within a few months, due to everyone learning more about the domain, the characteristics of the software environment, and changes in the business environment. Indeed we've found that delivering a subset of features does more than anything to help clarify what needs to be done next, so an iterative approach allows us to shift to an adaptive planning approach where we update our plans as we learn what the real software needs are. 
These are the major reasons why I've glibly said that "you should use iterative development only in projects that you want to succeed".
Waterfalls and iterations may nest inside each other. A six year project might consist of two 3 year projects, where each of the two projects are structured in a waterfall style, but the second project adds additional features. You can think of this as a two-iteration project at the top level with each iteration as a waterfall. Due to the large size and small number of iterations, I'd regard that as primarily a waterfall project. In contrast you might see a project with 16 iterations of one month each, where each iteration is planned in a waterfall style. That I'd see as primarily iterative. While in theory there's potential for a middle ground projects that are hard to classify, in practice it's usually easy to tell that one style predominates.
It is possible for a mix of waterfall and iterative where early phases (requirements analysis, high level design) are done in a waterfall style while later phases (detailed design, code, test) are done in an iterative manner. This reduces the risks inherent in late testing and integration phases, but does not enable adaptive planning.
Waterfall is often cast as the alternative to agile software development, but I don't see that as strictly true. Certainly agile processes require an iterative approach and cannot work in a waterfall style. But it is easy to follow an iterative approach (i.e. non-waterfall) but not be agile.  I might do this by taking 100 features and dividing them up into ten iterations over the next year, and then expecting that each iteration should complete on time with its planned set of features. If I do this, my initial plan is a predictive plan, if all goes well I should expect the work to closely follow the plan. But adaptive planning is an essential element of agile thinking. I expect features to move between iterations, new features to appear, and many features to be discarded as no longer valuable enough.g
My rule of thumb is that anyone who says “we were successful because we were on-time and on-budget” is thinking in terms of predictive planning, even if they are following an iterative process, and thus is not thinking with an agile mindset. In the agile world, success is all about business value - regardless of what was written in a plan months ago. Plans are made, but updated regularly. They guide decisions on what to do next, but are not used as a success measure.
1: There have been quite a few people seeking to interpret the Royce paper. Some argue that his paper opposes waterfall, pointing out that the paper discusses flaws in the kind of process suggested by the figure 2 that I've quoted here. Certainly he does discuss flaws, but he also says the illustrated approach is "fundamentally sound". Certainly this activity-based decomposition of projects became the accepted model in the decades that followed.
2: This leads to another common characteristic that goes with the term “waterfall” - rigid processes that tell everyone in detail what they should do. Certainly the software process folks in the 90s were keen on coming up with prescriptive methods, but such prescriptive thinking also affected many who advocated iterative techniques. Indeed although agile methods explicitly disavow this kind of Taylorist thinking, I often hear of faux-agile initiatives following this route.
3: The notion that a phase should be finished before the next one is started is a convenient fiction. Even the most eager waterfall proponent would agree that some rework on prior stages is necessary in practice, although I think most would say that if executed perfectly, each activity wouldn't need rework. Royce's paper explicitly discussed how iteration was expected between adjacent steps (eg Analysis and Program Design in his figure). However Royce argued that longer backtracks (eg between Program Design and Testing) were a serious problem.
4: This does raise the question of whether there are contexts where the waterfall style is actually better than the iterative one. In theory, waterfall might well work better in situations where there was a deep understanding of the requirements, and the technologies being used - and neither of those things would significantly change during the life of the product. I say "in theory" because I've not come across such a circumstance, so I can't judge if waterfall would be appropriate in practice. And even then I'd be reluctant to follow the waterfall style for the later phases (code-test-integrate) as I've found so much value in interleaving testing with coding while doing continuous integration..
5: In the 90s it was generally accepted in the object-oriented world that waterfall was a bad idea and should be replaced with an iterative style. However I don't think there was the degree of embracing changing requirements that appeared with the agile community.
My thanks to Ben Noble, Clare Sudbury, David Johnston, Karl Brown, Kyle Hodgson, Pramod Sadalage, Prasanna Pendse, Rebecca Parsons, Sriram Narayan, Sriram Narayanan, Tiago Griffo, Unmesh Joshi, and Vidhyalakshmi Narayanaswamy who discussed drafts of this post on our internal mailing list.
21 May 2019
Software systems are prone to the build up of cruft - deficiencies in internal quality that make it harder than it would ideally be to modify and extend the system further. Technical Debt is a metaphor, coined by Ward Cunningham, that frames how to think about dealing with this cruft, thinking of it like a financial debt. The extra effort that it takes to add new features is the interest paid on the debt.
Imagine I have a confusing module structure in my code base. I need to add a new feature. If the module structure was clear, then it would take me four days to add the feature but with this cruft, it takes me six days. The two day difference is the interest on the debt.
What most appeals to me about the debt metaphor is how it frames how I think about how to deal with this cruft. I could take five days to clean up the modular structure, removing that cruft, metaphorically paying off the principal. If I only do it for this one feature, that's no gain, as I'd take nine days instead of six. But if I have two more similar features coming up, then I'll end up faster by removing the cruft first.
Stated like that, this sounds like a simple matter of working the numbers, and any manager with a spreadsheet should figure out the choices. Sadly since we CannotMeasureProductivity, none of these costs are objectively measurable. We can estimate how long it takes to do a feature, estimate what it might be like if the cruft was removed, and estimate the cost of removing the cruft. But our accuracy of such estimates is pretty low.
Given this, usually the best route is to do what we usually do with financial debts, pay the principal off gradually. On the first feature I'll spend an extra couple of days to remove some of the cruft. That may be enough to reduce the interest rate on future enhancements to a single day. That's still going to take extra time, but by removing the cruft I'm making it cheaper for future changes to this code. The great benefit of gradual improvement like this is that it naturally means we spend more time on removing cruft in those areas that we modify frequently, which are exactly those areas of the code base where we most need the cruft to be removed.
Thinking of this as paying interest versus paying of principal can help decide which cruft to tackle. If I have a terrible area of the code base, one that's a nightmare to change, it's not a problem if I don't have to modify it. I only trigger an interest payment when I have to work with that part of the software (this is a place where the metaphor breaks down, since financial interest payments are triggered by the passage of time). So crufty but stable areas of code can be left alone. In contrast, areas of high activity need a zero-tolerance attitude to cruft, because the interest payments are cripplingly high. This is especially important since cruft accumulates where developers make changes without paying attention to internal quality - the more changes, the greater risk of cruft building up.
The metaphor of debt is sometimes used to justify neglecting internal quality. The argument is that it takes time and effort to stop cruft from building up. If there new features that are needed urgently, then perhaps it's best to take on the debt, accepting that this debt will have to be managed in the future.
The danger here is that most of the time this analysis isn't done well. Cruft has a quick impact, slowing down the very new features that are needed quickly. Teams who do this end up maxing out all their credit cards, but still delivering later than they would have done had they put the effort into higher internal quality. Here the metaphor often leads people astray, as the dynamics don't really match those for financial loans. Taking on debt to speed delivery only works if you stay below the design payoff line of the DesignStaminaHypothesis, and teams hit that line in weeks rather than months.
There are regular debates whether different kinds of cruft should be considered as debt or not. I found it useful to think about whether the debt is acquired deliberately and whether it is prudent or reckless - leading me to the TechnicalDebtQuadrant.
Ward Cunningham has a video talk where he discusses this metaphor he created.
A couple of readers sent in some similarly good names. David Panariti refers to ugly programming as deficit programming. Apparently he originally started using a few years ago when it fitted in with government policy; I suppose it's natural again now.
Scott Wood suggested "Technical Inflation could be viewed as the ground lost when the current level of technology surpasses that of the foundation of your product to the extent that it begins losing compatibility with the industry. Examples of this would be falling behind in versions of a language to the point where your code is no longer compatible with main stream compilers."
Steve McConnell brings out several good points in the metaphor, particularly how keeping your unintended debt down gives you more room to intentionally take on debt when it's useful to do so. I also like his notion of minimum payments (which are very high to fix issues with embedded systems as opposed to web sites).
Aaron Erickson talks about Enron financing.
Henrik Kniberg argues that it's older technical debt that causes the greatest problem and that it's wise to a qualitative debt ceiling to help manage it.
Erik Dietrich discusses the human cost of technical debt: team infighting, atrophied skills, and attrition.
I originally published this post on October 1 2003. I gave it a thorough rewrite in April 2019.
5 March 2019
In my recent client engagement, I foresaw that serverless architecture was a perfect fit. The idea of adopting serverless architecture, though, didn’t fly to our client well due to the fear of vendor lock-in. It was an interesting time for retailers as staying in AWS might mean that Amazon, as another retail business, will be given a competitive advantage. Given the idea of not supporting a competitor, my client was interested to ensure that the solution chosen by us is fully portable to other cloud vendors.
From a technical perspective, ensuring that we have the ability to move our system from one platform to another is definitely desirable. With the advent of containerization, why would one be interested to be locked in a specific platform? A high lock-in cost is not something that we would like to show back to the business when we have decided to move another platform. We, therefore, need to make sure that the migration cost is as low as possible when this scenario happens. If I’m about to make a simple formula for lock-in cost with our current understanding, it would look like this:
Lock-in cost = Migration cost (?)
This formula is correct when we are looking at it only from a technical perspective. A business perspective, however, should not be overlooked. Remember that the technical solutions we deliver are always designed to solve business problems. Most of the times the business get a benefit when a particular technology is adopted. One of the significant benefits is a faster time to market. Faster time to market can be formulated into opportunity gain:
Lock-in cost = Migration cost - Opportunity gain
Opportunity gain is very difficult to measure because you are dealing with an unknown unknown. Migration cost can be analyzed and reasoned. Opportunity gain, in contrast, is not as easy to analyze. You can theorize and analyze how to migrate from one platform to another, but how would you calculate the gain of seizing your competitors’ market opportunity? By looking at your decision-making process from a holistic view, combining both the technical and business perspective, the lock-in decision you are taking might result in a profit.
Let’s have a look into an example of building an event-driven architecture. You will need to choose a distributed messaging system in the architecture. If you are already chosen AWS as your platform, you would have the option of vendor-specific services like Kinesis. These services are fully managed and you can get it running in no time, hence giving you an opportunity gain. In comparison with a vendor-agnostic system like Kafka, these vendor-specific services will incur a higher migration cost. Setting up your own distributed messaging system, however, will take more time to harden and for it to be made production ready, especially when you are not experienced in building such platform yet. Instead of looking at your decision from just migration cost, focus on how you can reduce the migration cost by making your system more adaptable. Especially in this example of using a cloud, this is a similar reason on why we recommend to avoid the practice of generic cloud usage.
Thanks to Chris Ford, Matt Newman, Luciano Ramalho, Tobias Vogel, Zhamak Dehghani, Kitson Kelly, and Peter Gillard-Moss for their inputs.
Special thanks to Martin Fowler for his support, suggestions, and time spent with the content and help with publishing.
16 January 2018
Integration tests determine if independently developed units of software work correctly when they are connected to each other. The term has become blurred even by the diffuse standards of the software industry, so I've been wary of using it in my writing. In particular, many people assume integration tests are necessarily broad in scope, while they can be more effectively done with a narrower scope.
As often with these things, it's best to start with a bit of history. When I first learned about integration testing, it was in the 1980's and the waterfall was the dominant influence of software development thinking. In a larger project, we would have a design phase that would specify the interface and behavior of the various modules in the system. Modules would then be assigned to developers to program. It was not unusual for one programmer to be responsible for a single module, but this would be big enough that it could take months to build it. All this work was done in isolation, and when the programmer believed it was finished they would hand it over to QA for testing.
The first part of testing would be unit testing, which would test that module on its own, against the specification that had been done in the design phase. Once that was complete, we then move to integration testing, where the various modules are combined together, either into the entire system, or into significant sub-systems.
The point of integration testing, as the name suggests, is to test whether many separately developed modules work together as expected. It was performed by activating many modules and running higher level tests against all of them to ensure they operated together. These modules could parts of a single executable, or separate.
Looking at it from a more 2010s perspective, these conflated two different things:
- testing that separately developed modules worked together properly
- test that a system of multiple modules worked as expected.
These two things were easy to conflate, after all how else would you test the frobile and twibbler modules without activating them both into a single environment and running tests that exercised both modules?
The 2010s perspective offers another alternative, one that was rarely considered in the 1980s. In the alternative, we test the integration of the frobile and twibbler modules by exercising the portion of the code in frobile that interacts with twibbler, executing it against a TestDouble of twibbler. Providing the test double is a faithful double of twibber, we can then test all the interaction behavior of twibbler without activating a full twibbler instance. This may not be a big deal if they are separate modules of a monolithic application, but is a big deal if twibbler is a separate service, which requires its own build tools, environments, and network connections. For services, such tests may run against an in-process test double, or against an over-the-wire double, using something like mountebank
An obvious catch with integration testing against a double is whether that double is truly faithful. But we can test that separately using ContractTests.
Using this combination of using narrow integration tests and contract tests, I can be confident of integrating against an external service without ever running tests against a real instance of that service, which greatly eases my build process. Teams that do this, may still do some form of end-to-end system test with all real services, but if so it's only a final smoke test with a very limited range of paths tested. It also helps to have a mature QA in Production capability, and if that is mature enough, there may be no end-to-end system testing done at all.
The problem is that we have (at least) two different notions of what constitutes an integration test.
narrow integration tests
- exercise only that portion of the code in my service that talks to a separate service
- uses test doubles of those services, either in process or remote
- thus consist of many narrowly scoped tests, often no larger in scope than a unit test (and usually run with the same test framework that's used for unit tests)
broad integration tests
- require live versions of all services, requiring substantial test environment and network access
- exercise code paths through all services, not just code responsible for interactions
And there is a large population of software developers for whom “integration test” only means “broad integration tests”, leading to plenty of confusion when they run into people who use the narrow approach.
If your only integration tests are broad ones, you should consider exploring the narrow style, as it's likely to significantly improve your testing speed, ease of use, and resiliency. Since narrow integration tests are limited in scope, they often run very fast, so can run in early stages of a DeploymentPipeline, providing faster feedback should they go red.
All this is why I'm wary with “integration test”. When I read it, I look for more context so I know which kind the author really means.  If I talk about broad integration tests, I prefer to use “system test” or “end-to-end test”. I don’t have any better name for narrow integration tests, so I do use that (but with “narrow” to help signal to the reader the nature of these tests).
AcknowledgementsBirgitta Böckeler, Brian Oxley, Dave Rice, Deepti Mittal, Jonny Leroy, Kief Morris, Raimund Klein, Rogerio Chaves, and Tiago Griffo discussed drafts of this post on our internal mailing list.
1: Although I prefer to focus the definition on the interaction of separately built modules, I do occasionally see “integration test” used to mean anything bigger than a unit test. And for some users of solitary unit tests, I’ve seen them describe sociable unit tests as “integration tests”.