Conway's Law
20 October 2022
Pretty much all the practitioners I favor in Software Architecture are deeply suspicious of any kind of general law in the field. Good software architecture is very context-specific, analyzing trade-offs that resolve differently across a wide range of environments. But if there is one thing they all agree on, it's the importance and power of Conway's Law. Important enough to affect every system I've come across, and powerful enough that you're doomed to defeat if you try to fight it.
The law is probably best stated, by its author, as: [1]
Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure.
Conway's Law is essentially the observation that the architectures of software systems look remarkably similar to the organization of the development team that built it. It was originally described to me by saying that if a single team writes a compiler, it will be a one-pass compiler, but if the team is divided into two, then it will be a two-pass compiler. Although we usually discuss it with respect to software, the observation applies broadly to systems in general. [2]

As my colleague Chris Ford said to me: "Conway understood that software coupling is enabled and encouraged by human communication." If I can talk easily to the author of some code, then it is easier for me to build up a rich understanding of that code. This makes it easier for my code to interact, and thus be coupled, to that code. Not just in terms of explicit function calls, but also in the implicit shared assumptions and way of thinking about the problem domain.
We often see how inattention to the law can twist system architectures. If an architecture is designed at odds with the development organization's structure, then tensions appear in the software structure. Module interactions that were designed to be straightforward become complicated, because the teams responsible for them don't work together well. Beneficial design alternatives aren't even considered because the necessary development groups aren't talking to each other.
A dozen or two people can have deep and informal communications, so Conways Law indicates they will create a monolith. That's fine - so Conway's Law doesn't impact our thinking for smaller teams. It's when the humans need organizing that Conway's Law should affect decision making.
The first step in dealing with Conway's Law is know not to fight it. I still remember one sharp technical leader, who was just made the architect of a large new project that consisted of six teams in different cities all over the world. “I made my first architectural decision” he told me. “There are going to be six major subsystems. I have no idea what they are going to be, but there are going to be six of them.”
This example recognized the big impact location has on human communication. Putting teams on separate floors of the same building is enough to significantly reduce communication. Putting teams in separate cities, and time zones, further gets in the way of regular conversation. The architect recognized this, and realized that he needed take this into account in his technical design from the beginning. Components developed in different time-zones needed to have a well-defined and limited interaction because their creators would not be able to talk easily.[3]
A common mismatch with Conways Law is where an ActivityOriented team organization works at cross-purposes to feature development. Teams organized by software layer (eg front-end, back-end, and database) lead to dominant PresentationDomainDataLayering structures, which is problematic because each feature needs close collaboration between the layers. Similarly dividing people along the lines of life-cycle activity (analysis, design, coding, testing) means lots of hand-offs to get a feature from idea to production.
Accepting Conway's Law is superior to ignoring it, and in the last decade, we've seen a third way to respond to this law. Here we deliberately alter the development team's organization structure to encourage the desired software architecture, an approach referred to as the Inverse Conway Maneuver [4]. This approach is often talked about in the world of microservices, where advocates advise building small, long-lived BusinessCapabilityCentric teams that contain all the skills needed to deliver customer value. By organizing autonomous teams this way, we employ Conway's Law to encourage similarly autonomous services that can be enhanced and deployed independently of each other. This, indeed, is why I describe microservices as primarily a tool to structure a development organization.
Ignore | Don't take Conway's Law into account, because you've never heard of it, or you don't think it applies (narrator: it does) |
Accept | Recognize the impact of Conway's Law, and ensure your architecture doesn't clash with designers' communication patterns. |
Inverse Conway Maneuver | Change the communication patterns of the designers to encourage the desired software architecture. |
While the inverse Conway maneuver is a useful tool, it isn't all-powerful. If you have an existing system with a rigid architecture that you want to change, changing the development organization isn't going to be an instant fix. Instead it's more likely to result in a mismatch between developers and code that adds friction to further enhancement. With an existing system like this, the point of Conway's Law is that we need to take into account its presence while changing both organization and code base. And as usual, I'd recommend taking small steps while being vigilant for feedback.
Domain-Driven Design plays a role with Conway's Law to help define organization structures, since a key part of DDD is to identify BoundedContexts. A key characteristic of a Bounded Context is that it has its own UbiquitousLanguage, defined and understood by the group of people working in that context. Such contexts form ways to group people around a subject matter that can then align with the flow of value.
The key thing to remember about Conways Law is that the modular decomposition of a system and the decomposition of the development organization must be done together. This isn't just at the beginning, evolution of the architecture and reorganizing the human organization must go hand-in-hand throughout the life of an enterprise.
Further Reading
Recognizing the importance of Conway's Law means that budding software architects need to think about IT organization design. Two worthwhile books on this topic are Agile IT Organization Design by Narayan and Team Topologies by Skelton and Pais.
Birgitta Böckeler, Mike Mason, James Lewis and I discuss our experiences with Conway's Law on the ThoughtWorks Technology Podcast
Acknowledgements
Bill Codding, Birgitta Boeckeler, Camilla Crispim, Chris Ford, Gabriel Sadaka, Matteo Vaccari, Michael Chaffee, and Unmesh Joshi reviewed drafts of this article and suggested improvementsNotes
1: The source for Conway's law is an article written by Melvin Conway in 1968. It was published by Datamation, one of the most important journals for the software industry at that time. It was later dubbed “Conway’s Law” by Fred Brooks in his hugely influential book The Mythical Man-Month. I ran into it there at the beginning of my career in the 1980s, and it has been a thought-provoking companion ever since.
2: As Conway mentions, consider how the social problems around poverty, health care, housing, and education are influenced by the structures of government.
3: While location makes a big contribution to in-person communication patterns, one of the features of remote-first working, is that it reduces the role of distance, as everyone is communicating online. Conway's Law still applies, but it's based on the online communication patterns. Time zones still have a big effect, even online.
4: The term “inverse Conway maneuver” was coined by Jonny LeRoy and Matt Simons in an article published in the December 2010 issue of the Cutter IT journal.
Revisions
2022-10-24: I added the paragraph about the inverse Conway maneuver and rigid architectures. I also added the footnote about remote-first working.
DefaultTrialRetire
10 November 2021
Within each normal-sized team, limit the choice of alternatives for any class of technology to three. These are: the current sensible default, the one we're experimenting with as a trial, and the one that we hate and want to retire.
The conversation goes like this: We want to introduce a new messaging technology. How many do we have already in place? Oh we have three in active use, including one that's considered legacy and we're partway through migrating off and one that we experimented with previously but didn't gain traction. Ok, so we're at our limit now. If we want to add another messaging tech then we have two choices. Either migrate all of our apps off the legacy tech, or properly rid ourselves of the failed experiment. This is quite closely related to the idea of capping the number of Innovation Tokens in use within your teams.
At a team level these kinds of limits are relatively easy to maintain and discuss and act upon, because we have common priorities and ways of working and high trust, high bandwidth communication. At the scope of the whole organisation the challenge is similar, but getting alignment takes a lot longer and doing actual migration and consolidation work can take a long time - so we sometimes have to allow for more variation in technology. We also use different techniques to discuss and communicate the status of our preferred technologies.
An approach we use at MYOB to engage our whole organisation in broader decisions about technology is by publishing our own MYOB Technology Radar, following the format of the Thoughtworks Technology Radar. This approach of building our own radar involves taking input from all of our verticals and teams, and making a clear statement on what technologies we encourage teams to adopt, trial, or more importantly which ones to keep clear of.
RefinementCodeReview
28 January 2021
When people think of code reviews, they usually think in terms of an explicit step in a development team's workflow. These days the Pre-Integration Review, carried out on a Pull Request is the most common mechanism for a code review, to the point that many people witlessly consider that not using pull requests removes all opportunities for doing code review. Such a narrow view of code reviews doesn't just ignore a host of explicit mechanisms for review, it more importantly neglects probably the most powerful code review technique - that of perpetual refinement done by the entire team.
One of the most pervasive perspectives in software is the notion that it's something we build and complete - hence the endless metaphor of building construction and architecture. Yet the key property of software is that it is soft, and can be as easily modified after it's released as it was when initially composed in the programmer's editor. That's why Erik Dörnenburg wisely argues that architecture is a poor metaphor and would be better replaced by town planning. Valuable software is usually in a constant state of change, as we add features from a better understanding of the value it can bring. But the opportunity is not just to add new features, but also to refine that software - incorporating the lessons the team steadily learns about how best that software can enable these changes.
With the right environment, I can look a bit of code written six months ago, see some problems with how it's written, and quickly fix them. This may be because this code was flawed when it was written, or that changes in the code base since led to the code no longer being quite right. Whichever the cause, the important thing is to fix problems as soon as they start getting in our way. As soon as I have an understanding about the code that wasn't immediately apparent from reading it, I have the responsibility to (as Ward Cunningham so wonderfully said) take that understanding out of my head and put it into the code. That way the next reader won't have to work so hard.
This process of refinement is exactly the same as what happens in a code review, but it's triggered each time the code is looked at rather than when the code is added to the codebase. This was, for me, a crucial insight. After all, many problems that code reviews seek to remedy are problems that only become problems when the code is read in the future. There's a strong argument for not worrying about them until then. After all, just like adding a large apartment complex changes traffic patterns, we may have altered the context of the code six months later, altering the kind of fix that code needs. It also involves more people, in this scheme every developer that reads the code is a reviewer, and one that's able to review based on their actual use of the code rather than on some general, but often hazily-justified guidelines.
A way to think about the validity of a practice is by thinking about what happens if it's a monopoly. What if the only code review mechanism we have is the iteration from later programmers? One consequence is that the review attention gets concentrated on the areas of code that are read more often - which is mostly the areas that ought to get the attention. One concern is that code that's never read will never get reviewed - but mostly that's fine. A team with good testing practices can be confident that the code works, performance tests can identify performance issues. Given that, if the code never needs to be looked at again, we don't need to spend effort on making it comprehensible. I'd expect such cases to be vanishingly rare, but it's an informative thought experiment.
But most ≠ all. One obvious exception here is security issues. Code can work just fine for years until an attacker finds an exploit, at that point we'll lament its lack of review. This is an example of high-impact but rare safety concerns which deserve special scrutiny. However that doesn't mean we shouldn't make conscious use of refinement as a code review mechanism. Instead it means we should be aware of rare-high-impact concerns and adjust our workflow to watch for that kind of specific problem to the degree that it's needed in our circumstances. Threat analysis should alert us to the modules that need additional attention and the kinds of risks they face. Targeted code reviews might be scheduled for security concerns, these can run more effectively because they are focused on a specific kind of problem.
In order to do this perpetual code refinement we require other practices. If I'm going to change code I need to have confidence that it won't break existing functionality, so I need something like Self Testing Code. I need to know that it won't cause big merge conflicts for others, so I need Continuous Integration. We all need to be good at refactoring so we can change code effectively. Since this relies on many developers being expected to modify any part of the code base, we are best off with collective (or at least weak) code ownership. But given a team that has these skills, they can rely on using their regular refinement as a substantial part of their code review strategy.
If nothing else, I think it's important that we put more thought into the role of refinement as code review. One of the dangers of focusing solely on Pre-Integration Reviews is that it can lead teams to neglect how change works in a code base. If I have a pristine mainline, and ensure that every commit merged into that mainline is pristine - can I be sure that the codebase is still pristine after six months? I'd argue that I can't, because the changes mean a good decision about some code six months ago is no longer a good decision now. Refining the code allows us to evaluate old code against this changing usage, allowing us to sustain its health.
Acknowledgements
Ben Noble, Chris Ford, Evan Bottcher, Ian Cartwright, Jeremy Huiskamp, Ken Mugrage, Mario Giampietri, Martha Rohte, Omar Bashir, Peter Gillard-Moss, and Simon Brunning commented on drafts of this post on our internal mailing list.PullRequest
28 January 2021
Pull Requests are a mechanism popularized by github, used to help facilitate merging of work, particularly in the context of open-source projects. A contributor works on their contribution in a fork (clone) of the central repository. Once their contribution is finished they create a pull request to notify the owner of the central repository that their work is ready to be merged into the mainline. Tooling supports and encourages code review of the contribution before accepting the request. Pull requests have become widely used in software development, but critics are concerned by the addition of integration friction which can prevent continuous integration.
Pull requests essentially provide convenient tooling for a development workflow that existed in many open-source projects, particularly those using a distributed source-control system (such as git). This workflow begins with a contributor creating a new logical branch, either by starting a new branch in the central repository, cloning into a personal repository, or both. The contributor then works on that branch, typically in the style of a Feature Branch, pulling any updates from Mainline into their branch. When they are done they communicate with the maintainer of the central repository indicating that they are done, together with a reference to their commits. This reference could be the URL of a branch that needs to be integrated, or a set of patches in an email.
Once the maintainer gets the message, she can then examine the commits to decide if they are ready to go into mainline. If not, she can then suggest changes to the contributor, who then has opportunity to adjust their submission. Once all is ok, the maintainer can then merge, either with a regular merge/rebase or applying the patches from the final email.
Github's pull request mechanism makes this flow much easier. It keeps track of the clones through its fork mechanism, and automatically creates a message thread to discuss the pull request, together with behavior to handle the various steps in the review workflow. These conveniences were a major part of what made github successful and led to "pull request" becoming a fundamental part of the developer's lexicon.
So that's how pull requests work, but should we use them, and if so how? To answer that question, I like to step back from the mechanism and think about how it works in the context of a source code management workflow. To help me think about that, I wrote down a series of patterns for managing source code branching. I find understanding these (specifically the Base and Integration patterns) clarifies the role of pull requests.
In terms of these patterns, pull requests are a mechanism designed to implement a combination of Feature Branching and Pre-Integration Reviews. Thus to assess the usefulness of pull requests we first need to consider how applicable those patterns are to our situation. Like most patterns, they are sometimes valuable, and sometimes a pain in the neck - we have to examine them based on our specific context. Feature Branching is a good way of packaging together a logical contribution so that it can be assessed, accepted, or deferred as a single unit. This makes a lot of sense when contributors are not trusted to commit directly to mainline. But Feature Branching comes at a cost, which is that it usually limits the frequency of integration, leading to complicated merges and deterring refactoring. Pre-Integration Reviews provide a clear place to do code review at the cost of a significant increase in integration friction. [1]
That's a drastic summary of the situation (I need a lot more words to explain this further in the feature branching article), but it boils down to the fact that the value of these patterns, and thus the value of pull requests, rest mostly on the social structure of the team. Some teams work better with pull requests, some teams would find pull requests a severe drag on the effectiveness. I suspect that since pull requests are so popular, a lot of teams are using them by default when they would do better without them.
While pull requests are built for Feature Branches, teams can use them within a Continuous Integration environment. To do this they need to ensure that pull requests are small enough, and the team responsive enough, to follow the CI rule of thumb that everybody does Mainline Integration at least daily. (And I should remind everyone that Mainline Integration is more than just merging the current mainline into the feature branch). Using the ship/show/ask classification can be an effective way to integrate pull requests into a more CI-friendly workflow.
The wide usage of pull requests has encouraged a wider use of code review, since pull requests provide a clear point for Pre-Integration Review, together with tooling that encourages it. Code review is a Good Thing, but we must remember that a pull request isn't the only mechanism we can use for it. Many teams find great value in the continuous review afforded by Pair Programming. To avoid reducing integration frquency we can carry out post-integration code review in several ways. A formal process can record a review for each commit, or a tech lead can examine risky commits every couple of days. Perhaps the most powerful form of code review is one that's frequently ignored. A team that takes the attitude that the codebase is a fluid system, one that can be steadily refined with repeated iteration carries out Refinement Code Review every time a developer looks at existing code. I often hear people say that pull requests are necessary because without them you can't do code reviews - that's rubbish. Pre-integration code review is just one way to do code reviews, and for many teams it isn't the best choice.
Acknowledgements
Chris Ford, Dan Mutton, Jeremy Huiskamp, Kief Morris, Pramod Sadalage, and Ryan Boucher commented on drafts of this post on our internal mailing list.Notes
1: A colleague of mine recently calculated the time a client spent waiting for pull requests that had no comments (true of 91% of them). Total time waiting in 2020 for 7000 PRs was 130,000 hours. This figure included time elapsed over nights and weekends.
ComputationalNotebook
18 November 2020
A computational notebook is an environment for writing a prose document that allows the author to embed code which can be easily executed with the results also incorporated into the document. It's a platform particularly well-suited for data science work. Such environments include Jupyter Notebook, R Markdown, Mathematica, and Emacs's org-mode.
When I'm exploring some data, it's useful to keep my notes close together with the code that performs the exploration. I like to try some code, look at the results, and note down any observations I have from that execution. A computational notebook allows me to combine these together easily in a single document.
Here's an example of this, looking at some analysis of my google analytics data for martinfowler.com. I'm doing this in R Studio, which uses the R Markdown format.

The example out here is a graph, as notebooks are well suited for plotting various charts. But it's just as useful to embed various data manipulations in the code and display the data in the document as a table.
I first encountered a computational notebook in the late 1980's with Mathematica. I remember wishing I'd had access to such a tool during my university degree, but didn't use a computational notebook again until recent years, with the rise of their use in data science circles. The notebook software I hear most about is Jupyter Notebook, which is popular in the Python community, but as I do my data munging with R I tend to use R Markdown, usually within R Studio. I also use a rather more niche notebook, org-mode, which is part of Emacs.
The code embedded in Mathematica is its own programming language, designed for expressing mathematics. Although Jupyter began in the Python world, it supports a wide range of programming languages, as does R Markdown. Mathematica is a commercial tool, but Jupyter and R Markdown are open source. Jupyter stores its files in JSON, R Markdown uses markdown files with some special markup for the code blocks. Using a text format for the documents allows them to be stored in regular version control tools, and using a markup language makes diffing easier. Using a markup language allows the possibility of editing the documents in other editors, but they need to have a suitable environment for executing the code blocks.
Computational notebooks are useful when exploring a problem, such as trying various forms of analysis on a dataset. The document acts as a record of what's been tried and all the observations the researcher makes as they try things. By keeping the code and results together the writer can see exactly what they did and what results that generated. This coupling of code and results is a form of IllustrativeProgramming, making the environment appealing to lay programmers. One thing to be wary of, however, is if any external environmental factors change the result - such as the contents of a database. If the dataset isn't too large it can be exported and kept in the version control system, but often its size is prohibitive.
Notebooks are also useful for preparing reports, usually by generating a document in PDF, HTML, or other formats. If I want to report to an author on the traffic for their article, I take the last such report, change the subject URL, rerun all the code, and tweak any prose commentary I think is appropriate. If I were sufficiently motivated I could auto-generate such reports every few months. I like that such reports can easily include the code used to generate the results, so readers can accurately understand the logic behind the figures they see.
Notebooks shouldn't be used, however, as a component of a production system. The notebook structure - with its casual mix of IO, calculation, and UI - is there to encourage interactivity, but works against the modularity needed for code that is used as part of a broader code base. It's best to think of notebooks as a way of exploring logic, once you've found a path, that logic should be replicated into a library designed for production use.
KeystoneInterface
29 April 2020
Software development teams find life can be much easier if they integrate their work as often as they can. They also find it valuable to release frequently into production. But teams don't want to expose half-developed features to their users. A useful technique to deal with this tension is to build all the back-end code, integrate, but don't build the user-interface. The feature can be integrated and tested, but the UI is held back until the end until, like a keystone, it's added to complete the feature, revealing it to the users.
A simple example of this technique might be to give a customer the option of a rush order. Such an order needs to be priced, depending on where the customer lives and what delivery companies operate there. The nature of the goods involved affects the picking approach used in the warehouse. Certain customers may qualify to have rush orders available to them, which may also depend on the delivery location, the time of year, and the kind of goods ordered.
All in all that's a fair bit of business logic, particularly since it will involve gnarly integration with various warehousing, catalog, and customer service systems. Doing this could take several weeks, while other features, need to be released every few days. But as far as the user is concerned, a rush order is just a check-box on the order form.
To build this using the check-box as the keystone, the team does development work on the underlying business logic and interfaces to internal systems over the course of several production releases. The user is unaware of all this latent code. Only with the last step does the keystone check-box need to be made visible, which can be done in a relatively short time. This way all latent code can be integrated and be part of the system going into production, reducing the problems that come with a long-lived feature branch.

The latent code does need to be tested to the same degree of confidence that it would be if it were active. This can be done providing the architecture of the system is setup so that most testing isn't done through the user interface. Unit Tests and other lower layers of the Test Pyramid should be easy to run this way. Even Broad Stack Tests can be run providing there is a mechanism to make them Subcutaneous Tests. In some cases there will a significant amount of behavior within the UI itself, but this can also be tested if the design allows the visible UI to be a Humble Object.
Not all applications are built in such a way that they can be extensively tested in a subcutaneous manner - but the effort required to do this is worthwhile even without the capability to use a keystone. Tests running through the UI are always more trouble to setup, even with the best tools to automate the process. Moving more tests to subcutaneous and lower level tests, especially unit tests, can dramatically speed up Deployment Pipelines and enable Continuous Delivery.
Of course, most UIs will be more than a check-box, although often they aren't that much more work to keystone. In a web app, a complex feature will often be an independent web page, that can be built and tested in full, and the keystone is merely a link. A desktop may have several screens where the keystone is the menu-item to make them visible.
That said, there are cases when the UI can't be packaged into a simple keystone. When that's the case then it's time to use Feature Toggles. Even in this case, however, thinking of a keystone can be useful by ensuring that the feature toggle only applies to the UI. This avoids scattering lots of toggle points through the back end code, reduces the complexity of applying the toggle, allows the use of simple toggle mechanisms, and makes it easier to remove when the time comes.
There is a general danger with developing a UI last, in that the back-end code may be designed in a way that doesn't work with the UI once it's built, or the UI isn't given the attention it needs until late, leading to a lack of iteration and a poor user experience. For those reasons a keystone approach works best within an overall approach that encourages building a product through thin vertical slices that lead to releasing small but fully working features rapidly.
I've used the example of a user-interface here, but of course the same approach can be used with any other interface, such as an API. By building the consumer's interface last, and keeping it simple, we can build and integrate even large features in small chunks.
Dark Launching is a variation where the new feature is called once its built, but no results are shown to the user. This is done to measure the impact on the back-end systems, which is useful for some changes. Once all is good, we can add the keystone.
Acknowledgements
I first came across the metaphor of a keystone for this technique in the second edition of Kent Beck's Extreme Programming Explained. Pete Hodgson, Brandon Duff, and Stefan Smith reminded me that I'd forgotten this.
Dave Farley, Paul Hammant, and Pete Hodgson commented on drafts on this post.
OutcomeOverOutput
11 February 2020
Imagine a team writing software for a shopping website. If we look at the team's output, we might consider how many new features they produced in the last quarter, or a cross-functional measure such as a reduction in page load time. An outcome measure, however, would consider measure increased sales revenue, or reduced number of support calls for the product. Focusing on outcomes, rather than output, favors building features that do more to improve the effectiveness of the software's users and customers.

As with any professional activity, those of us involved in software development want to learn what makes us more effective. This is true of an individual developer trying to improve her own performance, for managers looking to improve teams within an organization, or a maven like me trying to raise the game of the entire industry. One of the things that makes this difficult is that there's no clear way to measure the productivity of a software team. And this measurement question gets further complicated by whether we base effectiveness on output or outcome.
I've always been of the opinion that outcome is what we should concentrate on. If a team delivers lots of functionality - whether we measure it in lines of code, function points, or stories - that functionality doesn't matter if it doesn't help the user improve their activity. Lots of unused features are wasted effort, indeed worse than that they bloat the code base making it harder to add new features in the future. Such a software development team needs to care about the usefulness of the new functionality, they improve as they deliver less features, but of greater utility.
One argument I've heard against using outcome-based observations is that it's harder to come up with repeatable measures for outcomes than it is for output. I find this point difficult to fathom. Measuring pure output for software is famously difficult. Lines of code are a useless measure even if they weren't so easily gamed. There's poor replicability with Function Point or Story Points - different people will give the same things different point scores. Compared to this, we are very good at measuring financial outcomes. Of course, many outcome observations are more tricky to make - consider customer satisfaction - but I don't see any of them as more difficult than software functionality.
Just calling something an “outcome”, of course, doesn't make something the right thing to focus on, and there is certainly a skill to picking the right outcomes to observe. One handy notion is that of Seiden, who says that an outcome should be a change in behavior of a user, employee, or customer that drives a good thing for the organization. He makes a distinction between “outcomes”, which are behavioral changes that are easier to observe, and “impacts” which are broader effects upon the organization. In developing EDGE, Highsmith, Luu, and Robinson advise that outcomes about customer value (reliability of a dishwasher) should be given more weight than outcomes about business value (warranty repair costs).
A consequential concern about using outcome observations is that it's harder to apportion them to a software development team. Consider a customer team that uses software to help them track the quality of goods in their supply chain. If we assess them by how many rejects there are by the final consumer, how much of that is due to the software, how much due the quality control procedures developed by quality analysts, and how much due to a separate initiative to improve the quality of raw materials? This difficulty of apportionment is a huge hurdle if we want to compare different software teams, perhaps in order to judge whether using Clojure has helped teams be more effective. Similarly there is the case that the developers work well and deliver excellent and valuable software to track quality, but the quality control procedures are no good. Consequently rejects don't go down and the initiative is seen as a failure, despite the developers doing a great job on their part.
But the problems of apportionment shouldn't be taken as a reason to observe the wrong thing. The common phrase says "you get what you measure", in this case it's more like "you get what you try to measure". If you focus appraisal of success on output, then everyone is thinking about how to increase the output. So even if it's tricky to determine how a team's work affects outcome, the fact that people are instead thinking about outcomes and how to improve them is worth more than any effort to compare teams' proficiency in producing the wrong things.
Further Reading
Seiden provides a nice framework for thinking of outcomes, one that's informed by experiences with non-profits who have a similarly tricky job of evaluating the impact of their work.
My colleagues developed EDGE as an operating model for transforming businesses to work in the digital world. Focusing on outcomes is a core part of their philosophy.
Focusing on outcomes naturally leads to favoring Outcome Oriented teams.
Acknowledgements
My fellow pioneers in the early days of Extreme Programming were very aware of the faults of assessing software development in terms of output. I remember Ron Jeffries and I arguing at an early agile conference workshop that any measures of a team's effectiveness should focus on outcome rather than output - although we did not use those words yet. That thinking is also reflected in my post Cannot Measure Productivity.
I recall starting to hear my colleagues at Thoughtworks talking about a distinction between outcome and output appearing in the 2000s, leading Daniel Terhorst-North to suggest that outcome over features should be a fifth agile value. This favor to outcomes is a regular theme in Thoughtworks-birthed books such as Lean Enterprise, EDGE, and the Digital Transformation Game Plan.
Alexander Steinhart, Alexandra Mogul, Andy Birds, Dale Peakall, Dean Eyre, Gabriel Sixel, Jeff Mangan, Job Rwebembera, Kief Morris, Linus Karsai, Mariela Barzallo, Peter Gillard-Moss, Steven Wilhelm, Vanessa Towers, Vikrant Kardam, and Xiao Ran discussed drafts of this post on our internal mailing list. Peter Gillard-Moss led me to the Seiden book and other work from the non-profit world.