DiversityMediocrityIllusion

diversity

tags:

I've often been involved in discussions about deliberately increasing the diversity of a group of people. The most common case in software is increasing the proportion of women. Two examples are in hiring and conference speaker rosters where we discuss trying to get the proportion of women to some level that's higher than usual. A common argument against pushing for greater diversity is that it will lower standards, raising the spectre of a diverse but mediocre group.

To understand why this is an illusionary concern, I like to consider a little thought experiment. Imagine a giant bucket that contains a hundred thousand marbles. You know that 10% of these marbles have a special sparkle that you can see when you carefully examine them. You also know that 80% of these marbles are blue and 20% pink, and that sparkles exist evenly across both colors [1]. If you were asked to pick out ten sparkly marbles, you know you could confidently go through some and pick them out. So now imagine you're told to pick out ten marbles such that five were blue and five were pink.

I don't think you would react by saying “that's impossible”. After all there are two thousand pink sparkly marbles in there, getting five of them is not beyond the wit of even a man. Similarly in software, there may be less women in the software business, but there are still enough good women to fit the roles a company or a conference needs.

The point of the marbles analogy, however, is to focus on the real consequence of the demand for 50:50 split. Yes it's possible to find the appropriate marbles, but the downside is that it takes longer. [2]

That notion applies to finding the right people too. Getting a better than base proportion of women isn't impossible, but it does require more work, often much more work. This extra effort reinforces the rarity, if people have difficulty finding good people as it is, it needs determined effort to spend the extra time to get a higher proportion of the minority group — even if you are only trying to raise the proportion of women up to 30%, rather than a full 50%.

In recent years we've made increasing our diversity a high priority at ThoughtWorks. This has led to a lot of effort trying to go to where we are more likely to run into the talented women we are seeking: women's colleges, women-in-IT groups and conferences. We encourage our women to speak at conferences, which helps let other women know we value a diverse workforce.

When interviewing, we make a point of ensuring there are women involved. This gives women candidates someone to relate to, and someone to ask questions which are often difficult to ask men. It's also vital to have women interview men, since we've found that women often spot problematic behaviors that men miss as we just don't have the experiences of subtle discriminations. Getting a diverse group of people inside the company isn't just a matter of recruiting, it also means paying a lot of attention to the environment we have, to try to ensure we don't have the same AlienatingAtmosphere that much of the industry exhibits. [3]

One argument I've heard against this approach is that if everyone did this, then we would run out of pink, sparkly marbles. We'll know this is something to be worried about when women are paid significantly more than men for the same work.

One anecdote that stuck in my memory was from a large, traditional company who wanted to improve the number of women in senior management positions. They didn't impose a quota on appointing women to those positions, but they did impose a quota for women on the list of candidates. (Something like: "there must be at least three credible women candidates for each post".) This candidate quota forced the company to actively seek out women candidates. The interesting point was that just doing this, with no mandate to actually appoint these women, correlated with an increased proportion of women in those positions.

For conference planning it's a similar strategy: just putting out a call for papers and saying you'd like a diverse speaker lineup isn't enough. Neither are such things as blind review of proposals (and I'm not sure that's a good idea anyway). The important thing is to seek out women and encourage them to submit ideas. Organizing conferences is hard enough work as it is, so I can sympathize with those that don't want to add to the workload, but those that do can get there. FlowCon is a good example of a conference that made this an explicit goal and did far better than the industry average (and in case you were wondering, there was no difference between men's and women's evaluation scores).

So now that we recognize that getting greater diversity is a matter of application and effort, we can ask ourselves whether the benefit is worth the cost. In a broad professional sense, I've argued that it is, because our DiversityImbalance is reducing our ability to bring the talent we need into our profession, and reducing the influence our profession needs to have on society. In addition I believe there is a moral argument to push back against long-standing wrongs faced by HistoricallyDiscriminatedAgainst groups.

Conferences have an important role to play in correcting this imbalance. The roster of speakers is, at least subconsciously, a statement of what the profession should look like. If it's all white guys like me, then that adds to the AlienatingAtmosphere that pushes women out of the profession. Therefore I believe that conferences need to strive to get an increased proportion of historically-discriminated-against speakers. We, as a profession, need to push them to do this. It also means that women have an extra burden to become visible and act as part of that better direction for us. [4]

For companies, the choice is more personal. For me, ThoughtWorks's efforts to improve its diversity are a major factor in why I've been an employee here for over a decade. I don't think it's a coincidence that ThoughtWorks is also a company that has a greater open-mindedness, and a lack of political maneuvering, than most of the companies I've consulted with over the years. I consider those attributes to be a considerable competitive advantage in attracting talented people, and providing an environment where we can collaborate effectively to do our work.

But I'm not holding ThoughtWorks up as an example of perfection. We've made a lot of progress over the decade I've been here, but we still have a long way to go. In particular we are very short of senior technical women. We've introduced a number of programs around networks, and leadership development, to help grow women to fill those gaps. But these things take time - all you have to do is look at our Technical Advisory Board to see that we are a long way from the ratio we seek.

Despite my knowledge of how far we still have to climb, I can glimpse the summit ahead. At a recent AwayDay in Atlanta I was delighted to see how many younger technical women we've managed to bring into the company. While struggling to keep my head above water as the sole male during a late night game of Dominion, I enjoyed a great feeling of hope for our future.

Notes

1: That is 10% of blue marbles are sparkly as are 10% of pink.

2: Actually, if I dig around for a while in that bucket, I find that some marbles are neither blue nor pink, but some engaging mixture of the two.

3: This is especially tricky for a company like us, where so much of our work is done in client environments, where we aren't able to exert as much of an influence as we'd like. Some of our offices have put together special training to educate both sexes on how to deal with sexist situations with clients. As a man, I feel it's important for me to know how I can be supportive, it's not something I do well, but it is something I want to learn to improve.

4: Many people find the pressure of public speaking intimidating (I've come to hate it, even with all my practice). Feeling that you're representing your entire gender or race only makes it worse.

Acknowledgements

Camila Tartari, Carol Cintra, Dani Schufeldt, Derek Hammer, Isabella Degen, Korny Sietsma, Lindy Stephens, Mridula Jayaraman, Nikki Appleby, Rebecca Parsons, Sarah Taraporewalla, Stefanie Tinder, and Suzi Edwards-Alexander commented on drafts of this article.

Share:


SacrificialArchitecture

process theory · evolutionary design · application architecture

tags:

You're sitting in a meeting, contemplating the code that your team has been working on for the last couple of years. You've come to the decision that the best thing you can do now is to throw away all that code, and rebuild on a totally new architecture. How does that make you feel about that doomed code, about the time you spent working on it, about the decisions you made all that time ago?

For many people throwing away a code base is a sign of failure, perhaps understandable given the inherent exploratory nature of software development, but still failure.

But often the best code you can write now is code you'll discard in a couple of years time.

Often we think of great code as long-lived software. I'm writing this article in an editor which dates back to the 1980's. Much thinking on software architecture is how to facilitate that kind of longevity. Yet success can also be built on the top of code long since sent to /dev/null.

Consider the story of eBay, one of the web's most successful large businesses. It started as a set of perl scripts built over a weekend in 1995. In 1997 it was all torn down and replaced with a system written in C++ on top of the windows tools of the time. Then in 2002 the application was rewritten again in Java. Were these early versions an error because the were replaced? Hardly. Ebay is one of the great successes of the web so far, but much of that success was built on the discarded software of the 90's. Like many successful websites, ebay has seen exponential growth - and exponential growth isn't kind to architectural decisions. The right architecture to support 1996-ebay isn't going to be the right architecture for 2006-ebay. The 1996 one won't handle 2006's load but the 2006 version is too complex to build, maintain, and evolve for the needs of 1996.

Indeed this guideline can be baked into an organization's way of working. At Google, the explicit rule is to design a system for ten times its current needs, with the implication that if the needs exceed an order of magnitude then it's often better to throw away and replace from scratch [1]. It's common for subsystems to be redesigned and thrown away every few years.

Indeed it's a common pattern to see people coming into a maturing code base denigrating its lack of performance or scalability. But often in the early period of a software system you're less sure of what it really needs to do, so it's important to put more focus on flexibility for changing features rather than performance or availability. Later on you need to switch priorities as you get more users, but getting too many users on an unperforment code base is usually the better problem than its inverse. Jeff Atwood coined the phrase "performance is a feature", which some people read as saying the performance is always priority number 1. But any feature is something you have to choose versus other features. That's not saying you should ignore things like performance - software can get sufficiently slow and unreliable to kill a business - but the team has to make the difficult trade-offs with other needs. Often these are more business decisions rather than technology ones.

So what does it mean to deliberately choose a sacrificial architecture? Essentially it means accepting now that in a few years time you'll (hopefully) need to throw away what you're currently building. This can mean accepting limits to the cross-functional needs of what you're putting together. It can mean thinking now about things that can make it easier to replace when the time comes - software designers rarely think about how to design their creation to support its graceful replacement. It also means recognizing that software that's thrown away in a relatively short time can still deliver plenty of value.

Knowing your architecture is sacrificial doesn't mean abandoning the internal quality of the software. Usually sacrificing internal quality will bite you more rapidly than the replacement time, unless you're already working on retiring the code base. Good modularity is a vital part of a healthy code base, and modularity is usually a big help when replacing a system. Indeed one of the best things to do with an early version of a system is to explore what the best modular structure should be so that you can build on that knowledge for the replacement. While it can be reasonable to sacrifice an entire system in its early days, as a system grows it's more effective to sacrifice individual modules - which you can only do if you have good module boundaries.

One thing that's easily missed when it comes to handling this problem is accounting. Yes, really — we've run into situations where people have been reluctant to replace a clearly unviable system because of the way they were amortizing the codebase. This is more likely to be an issue for big enterprises, but don't forget to check it if you live in that world.

You can also apply this principle to features within an existing system. If you're building a new feature it's often wise to make it available to only a subset of your users, so you can get feedback on whether it's a good idea. To do that you may initially build it in a sacrificial way, so that you don't invest the full effort on a feature that you find isn't worth full deployment.

Modular replaceability is a principal argument in favor of a microservices architecture, but I'm wary to recommend that for a sacrificial architecture. Microservices imply distribution and asynchrony, which are both complexity boosters. I've already run into a couple of projects that took the microservice path without really needing to — seriously slowing down their feature pipeline as a result. So a monolith is often a good sacrificial architecture, with microservices introduced later to gradually pull it apart.

The team that writes the sacrificial architecture is the team that decides it's time to sacrifice it. This is a different case to a new team coming in, hating the existing code, and wanting to rewrite it. It's easy to hate code you didn't write, without an understanding of the context in which it was written. Knowingly sacrificing your own code is a very different dynamic, and knowing you going to be sacrificing the code you're about to write is a useful variant on that.

Acknowledgements

Conversations with Randy Shoup encouraged and helped me formulate this post, in particular describing the history of eBay (and some similar stories from Google). Jonny Leroy pointed out the accounting issue. Keif Morris, Jason Yip, Mahendra Kariya, Jessica Kerr, Rahul Jain, Andrew Kiellor, Fabio Pereira, Pramod Sadalage, Jen Smith, Charles Haynes, Scott Robinson and Paul Hammant provided useful comments.

Notes

1: As Jeff Dean puts it "design for ~10X growth, but plan to rewrite before ~100X"

Share:


MicroservicePrerequisites

microservices

tags:

As I talk to people about using a microservices architectural style I hear a lot of optimism. Developers enjoy working with smaller units and have expectations of better modularity than with monoliths. But as with any architectural decision there are trade-offs. In particular with microservices there are serious consequences for operations, who now have to handle an ecosystem of small services rather than a single, well-defined monolith. Consequently if you don't have certain baseline competencies, you shouldn't consider using the microservice style.

Rapid provisioning: you should be able to fire up a new server in a matter of hours. Naturally this fits in with CloudComputing, but it's also something that can be done without a full cloud service. To be able to do such rapid provisioning, you'll need a lot of automation - it may not have to be fully automated to start with, but to do serious microservices later it will need to get that way.

Basic Monitoring: with many loosely-coupled services collaborating in production, things are bound to go wrong in ways that are difficult to detect in test environments. As a result it's essential that a monitoring regime is in place to detect serious problems quickly. The baseline here is detecting technical issues (counting errors, service availability, etc) but it's also worth monitoring business issues (such as detecting a drop in orders). If a sudden problem appears then you need to ensure you can quickly rollback, hence…

Rapid application deployment: with many services to mangage, you need to be able to quickly deploy them, both to test environments and to production. Usually this will involve a DeploymentPipeline that can execute in no more than a couple of hours. Some manual intervention is alright in the early stages, but you'll be looking to fully automate it soon.

These capabilities imply an important organizational shift - close collaboration between developers and operations: the DevOps culture. This collaboration is needed to ensure that provisioning and deployment can be done rapidly, it's also important to ensure you can react quickly when your monitoring indicates a problem. In particular any incident management needs to involve the development team and operations, both in fixing the immediate problem and the root-cause analysis to ensure the underlying problems are fixed.

With this kind of setup in place, you're ready for a first system using a handful of microservices. Deploy this system and use it in production, expect to learn a lot about keeping it healthy and ensuring the devops collaboration is working well. Give yourself time to do this, learn from it, and grow more capability before you ramp up your number of services.

If you don't have these capabilities now, you should ensure you develop them so they are ready by the time you put a microservice system into production. Indeed these are capabilities that you really ought to have for monolithic systems too. While they aren't universally present across software organizations, there are very few places where they shouldn't be a high priority.

Going beyond a handful of services requires more. You'll need to trace business transactions though multiple services and automate your provisioning and deployment by fully embracing ContinuousDelivery. There's also the shift to product centered teams that needs to be started. You'll need to organize your development environment so developers can easily swap between multiple repositories, libraries, and languages. Some of my contacts are sensing that there could be a useful MaturityModel here that can help organizations as they take on more microservice implementations - we should see more conversation on that in the next few years.

Acknowledgements

This list originated in discussions with my ThoughtWorks colleagues, particularly those who attended the microservice summit earlier this year. I then structured and finalized the list in discussion with Evan Bottcher, Thiyagu Palanisamy, Sam Newman, and James Lewis.

And as usual there were valuable comments from our internal mailing list from Chris Ford, Kief Morris, Premanand Chandrasekaran, Rebecca Parsons, Sarah Taraporewalla, and Ian Cartwright.

Share:


MaturityModel

certification · agile adoption · process theory

tags:

A maturity model is a tool that helps people assess the current effectiveness of a person or group and supports figuring out what capabilities they need to acquire next in order to improve their performance. In many circles maturity models have gained a bad reputation, but although they can easily be misused, in proper hands they can be helpful.

Maturity models are structured as a series of levels of effectiveness. It's assumed that anyone in the field will pass through the levels in sequence as they become more capable.

So a whimsical example might be that of mixology (a fancy term for someone who makes cocktails). We might define levels like this:

  1. Knows how to make a dozen basic drinks (eg "make me a Manhattan")
  2. Knows at least 100 recipes, can substitute ingredients (eg "make me a Vieux Carre in a bar that lacks Peychaud's")
  3. Able to come up with cocktails (either invented or recalled) with a few simple constraints on ingredients and styles (eg "make me something with sherry and tequila that's moderately sweet").

Working with a maturity model begins with assessment, determining which level the subject is currently performing in. Once you've carried out an assessment to determine your level, then you use the level above your own to prioritize what capabilities you need to learn next. This prioritization of learning is really the big benefit of using a maturity model. It's founded on the notion that if you are at level 2 in something, it's much more important to learn the things at level 3 than level 4. The model thus acts as guide to what to learn, putting some structure on what otherwise would be a more complex process.

The vital point here is that the true outcome of a maturity model assessment isn't what level you are but the list of things you need to work on to improve. Your current level is merely a piece of intermediate work in order to determine that list of skills to acquire next.

Any maturity model, like any model, is a simplification: wrong but hopefully useful. Sometimes even a crude model can help you figure out what the next step is to take, but if your needed mix of capabilities varies too much in different contexts, then this form of simplification isn't likely to be worthwhile.

A maturity model may have only a single dimension, or may have multiple dimensions. In this way you might be level 2 in 19th century cocktails but level 3 in tiki drinks. Adding dimensions makes the model more nuanced, but also makes it more complex - and much of the value of a model comes from simplification, even if it's a bit of an over-simplification.

As well as using a maturity model for prioritizing learning, it can also be helpful in the investment decisions involved. A maturity model can contain generalized estimates of progress, such as "to get from level 4 to 5 usually takes around 6 months and a 25% productivity reduction". Such estimates are, of course, as crude as the model, and like any estimation you should only use it when you have a clear PurposeOfEstimation. Timing estimates can also be helpful in dealing with impatience, particularly with level changes that take many months. The model can help structure such generalizations by being applied to past work ("we've done 7 level 2-3 shifts and they took 3-7 months").

Most people I know in the software world treat maturity models with an inherent feeling of disdain, most of which you can understand by looking at the Capability Maturity Model (CMM) - the best known maturity model in the software world. The disdain for the CMM sprung from two main roots. The first problem was the CMM was very much associated with a document-heavy, plan-driven culture which was very much in opposition to the agile software community.

But the more serious problem with the CMM was the corruption of its core value by certification. Software development companies realized that they could gain a competitive advantage by having themselves certified at a higher level than their competitors - this led to a whole world of often-bogus certification levels, levels that lacked a CertificationCompetenceCorrelation. Using a maturity model to say one group is better than another is a classic example of ruining an informational metric by incentivizing it. My feeling that anyone doing an assessment should never publicize the current level outside of the group they are working with.

It may be that this tendency to compare levels to judge worth is a fundamentally destructive feature of a maturity model, one that will always undermine any positive value that comes from it. Certainly it feels too easy to see maturity models as catnip for consultants looking to sell performance improvement efforts - which is why there's always lots of pushback on our internal mailing list whenever someone suggests a maturity model to add some structure to our consulting work.

In an email discussion over a draft of this article, Jason Yip observed a more fundamental problem with maturity models:

"One of my main annoyances with most maturity models is not so much that they're simplified and linear, but more that they're suggesting a poor learning order, usually reflecting what's easier to what's harder rather than you should typically learn following this path, which may start with some difficult things.

In other words, the maturity model conflates level of effectiveness with learning path"

Jason's observation doesn't mean maturity models are never a good idea, but they do raise extra questions when assessing their fitness. Whenever you use any kind of model to understand a situation and draw inferences, you need to first ensure that the model is a good fit to the circumstances. If the model doesn't fit, that doesn't mean it's a bad model, but it does mean it's inappropriate for this situation. Too often, people don't put enough care in evaluating the fitness of a model for a situation before they leap to using it.

Acknowledgements

Jeff Xiong reminded me that a model can be helpful for investment decisions. Sriram Narayan and Jason Yip contributed some helpful feedback.

Share:


CanaryRelease

delivery · lean

tags:

Canary release is a technique to reduce the risk of introducing a new software version in production by slowly rolling out the change to a small subset of users before rolling it out to the entire infrastructure and making it available to everybody.

Similar to a BlueGreenDeployment, you start by deploying the new version of your software to a subset of your infrastructure, to which no users are routed.

When you are happy with the new version, you can start routing a few selected users to it. There are different strategies to choose which users will see the new version: a simple strategy is to use a random sample; some companies choose to release the new version to their internal users and employees before releasing to the world; another more sophisticated approach is to choose users based on their profile and other demographics.

As you gain more confidence in the new version, you can start releasing it to more servers in your infrastructure and routing more users to it. A good practice to rollout the new version is to repurpose your existing infrastructure using PhoenixServers or to provision new infrastructure and decommission the old one using ImmutableServers.

Canary release is an application of ParallelChange, where the migrate phase lasts until all the users have been routed to the new version. At that point, you can decomission the old infrastructure. If you find any problems with the new version, the rollback strategy is simply to reroute users back to the old version until you have fixed the problem.

A benefit of using canary releases is the ability to do capacity testing of the new version in a production environment with a safe rollback strategy if issues are found. By slowly ramping up the load, you can monitor and capture metrics about how the new version impacts the production environment. This is an alternative approach to creating an entirely separate capacity testing environment, because the environment will be as production-like as it can be.

Although the name for this technique might not be familiar [1], the practice of canary releasing has been adopted for some time. Sometimes it is referred to as a phased rollout or an incremental rollout.

In large, distributed scenarios, instead of using a router to decide which users will be redirected to the new version, it is also common to use different partitioning strategies. For example: if you have geographically distributed users, you can rollout the new version to a region or a specific location first; if you have multiple brands you can rollout to a single brand first, etc. Facebook chooses to use a strategy with multiple canaries, the first one being visible only to their internal employees and having all the FeatureToggles turned on so they can detect problems with new features early.

Canary releases can be used as a way to implement A/B testing due to similarities in the technical implementation. However, it is preferable to avoid conflating these two concerns: while canary releases are a good way to detect problems and regressions, A/B testing is a way to test a hypothesis using variant implementations. If you monitor business metrics to detect regressions with a canary [2], also using it for A/B testing could interfere with the results. On a more practical note, it can take days to gather enough data to demonstrate statistical significance from an A/B test, while you would want a canary rollout to complete in minutes or hours.

One drawback of using canary releases is that you have to manage multiple versions of your software at once. You can even decide to have more than two versions running in production at the same time, however it is best to keep the number of concurrent versions to a minimum.

Another scenario where using canary releases is hard is when you distribute software that is installed in the users' computers or mobile devices. In this case, you have less control over when the upgrade to the new version happens. If the distributed software communicates with a backend, you can use ParallelChange to support both versions and monitor which client versions are being used. Once the usage numbers fall to a certain level, you can then contract the backend to only support the new version.

Managing database changes also requires attention when doing canary releases. Again, using ParallelChange is a technique to mitigate this problem. It allows the database to support both versions of the application during the rollout phase.

Further Reading

Canary release is described by Jez Humble and Dave Farley in the book Continuous Delivery.

In this talk, Chuck Rossi describes Facebook's release process and their use of canary releases in more detail.

Acknowledgements

Thanks to many ThoughtWorks colleagues for their feedback: Jez Humble, Rohith Rajagopal, Charles Haynes, Andrew Maddison, Mark Taylor, Sunit Parekh, and Sam Newman.

Notes

1: The name for this technique originates from miners who would carry a canary in a cage down the coal mines. If toxic gases leaked into the mine, it would kill the canary before killing the miners. A canary release provides a similar form of early warning for potential problems before impacting your entire production infrastructure or user base.

2: The technique of monitoring business metrics and automatically rolling back a release on a statistically significant regression is known as a cluster immune system and was pioneered by IMVU. They describe this and other practices in their Continuous Deployment approach in this blog post.

Share:


ParallelChange

evolutionary design · API design · refactoring

tags:

Making a change to an interface that impacts all its consumers requires two thinking modes: implementing the change itself, and then updating all its usages. This can be hard when you try to do both at the same time, especially if the change is on a PublishedInterface with multiple or external clients.

Parallel change, also known as expand and contract, is a pattern to implement backward-incompatible changes to an interface in a safe manner, by breaking the change into three distinct phases: expand, migrate, and contract.

To understand the pattern, let's use an example of a simple Grid class that stores and provides information about its cells using a pair of x and y integer coordinates. Cells are stored internally in a two-dimentional array and clients can use the addCell(), fetchCell() and isEmpty() methods to interact with the grid.

  class Grid {
    private Cell[][] cells;
    …

    public void addCell(int x, int y, Cell cell) {
      cells[x][y] = cell;
    }

    public Cell fetchCell(int x, int y) {
      return cells[x][y];
    }

    public boolean isEmpty(int x, int y) {
      return cells[x][y] == null;
    }
  }
  

As part of refactoring, we detect that x and y are a DataClump and decide to introduce a new Coordinate class. However, this will be a backwards-incompatible change to clients of the Grid class. Instead of changing all the methods and the internal data structure at once, we decide to apply the parallel change pattern.

In the expand phase you augment the interface to support both the old and the new versions. In our example, we introduce a new Map<Coordinate, Cell> data structure and the new methods that can receive Coordinate instances without changing the existing code.

  class Grid {
    private Cell[][] cells;
    private Map<Coordinate, Cell> newCells;
    …

    public void addCell(int x, int y, Cell cell) {
      cells[x][y] = cell;
    }

    public void addCell(Coordinate coordinate, Cell cell) {
      newCells.put(coordinate, cell);
    }

    public Cell fetchCell(int x, int y) {
      return cells[x][y];
    }

    public Cell fetchCell(Coordinate coordinate) {
      return newCells.get(coordinate);
    }

    public boolean isEmpty(int x, int y) {
      return cells[x][y] == null;
    }

    public boolean isEmpty(Coordinate coordinate) {
      return !newCells.containsKey(coordinate);
    }
  }
  

Existing clients will continue to consume the old version, and the new changes can be introduced incrementally without affecting them.

During the migrate phase you update all clients using the old version to the new version. This can be done incrementally and, in the case of external clients, this will be the longest phase.

Once all usages have been migrated to the new version, you perform the contract phase to remove the old version and change the interface so that it only supports the new version.

In our example, since the internal two-dimentional array is not used anymore after the old methods have been deleted, we can safely remove that data structure and rename newCells back to cells.

  class Grid {
    private Map<Coordinate, Cell> cells;
    …

    public void addCell(Coordinate coordinate, Cell cell) {
      cells.put(coordinate, cell);
    }

    public Cell fetchCell(Coordinate coordinate) {
      return cells.get(coordinate);
    }

    public boolean isEmpty(Coordinate coordinate) {
      return !cells.containsKey(coordinate);
    }
  }
  

This pattern is particularly useful when practicing ContinuousDelivery because it allows your code to be released in any of these three phases. It also lowers the risk of change by allowing you to migrate clients and to test the new version incrementally.

Even when you have control over all usages of the interface, following this pattern is still useful because it prevents you from spreading breakage across the entire codebase all at once. The migrate phase can be short, but it is an alternative to leaning on the compiler to find all the usages that need to be fixed.

Some example applications of this pattern are:

During the migrate phase, a FeatureToggle can be used to control which version of the interface is used. A feature toggle on the client side allows it to be forward-compatible with the new version of the supplier, which decouples the release of the supplier from the client.

When implementing BranchByAbstraction, parallel change is a good way to introduce the abstraction layer between the clients and the supplier. It is also an alternative way to perform a large-scale change without introducing the abstraction layer as a seam for replacement on the supplier side. However, when you have a large number of clients, using branch by abstraction is a better strategy to narrow the surface of change and reduce confusion during the migrate phase.

The downside of using parallel change is that during the migrate phase the supplier has to support two different versions, and clients could get confused about which version is new versus old. If the contract phase is not executed you might end up in a worse state than you started, therefore you need discipline to finish the transition successfully. Adding deprecation notes, documentation or TODO notes might help inform clients and other developers working on the same codebase about which version is in the process of being replaced.

Further Reading

Industrial Logic's refactoring album documents and demonstrates an example of performing a parallel change.

Acknowledgements

This technique was first documented as a refactoring strategy by Joshua Kerievsky in 2006 and presented in his talk The Limited Red Society presented at the Lean Software and Systems Conference in 2010.

Thanks to Joshua Kerievsky for giving feedback on the first draft of this post. Also thanks to many ThoughtWorks colleagues for their feedback: Greg Dutcher, Badrinath Janakiraman, Praful Todkar, Rick Carragher, Filipe Esperandio, Jason Yip, Tushar Madhukar, Pete Hodgson, and Kief Morris.

Share:


UnitTest

testing · extreme programming

tags:

Unit testing is often talked about in software development, and is a term that I've been familiar with during my whole time writing programs. Like most software development terminology, however, it's very ill-defined, and I see confusion can often occur when people think that it's more tightly defined than it actually is.

Although I'd done plenty of unit testing before, my definitive exposure was when I started working with Kent Beck and used the Xunit family of unit testing tools. (Indeed I sometimes think a good term for this style of testing might be "xunit testing.") Unit testing also became a signature activity of ExtremeProgramming (XP), and led quickly to TestDrivenDevelopment.

There were definitional concerns about XP's use of unit testing right from the early days. I have a distinct memory of a discussion on a usenet discussion group where us XPers were berated by a testing expert for misusing the term "unit test." We asked him for his definition and he replied with something like "in the morning of my training course I cover 24 different definitions of unit test."

Despite the variations, there are some common elements. Firstly there is a notion that unit tests are low-level, focusing on a small part of the software system. Secondly unit tests are usually written these days by the programmers themselves using their regular tools - the only difference being the use of some sort of unit testing framework [1]. Thirdly unit tests are expected to be significantly faster than other kinds of tests.

So there's some common elements, but there are also differences. One difference is what people consider to be a unit. Object-oriented design tends to treat a class as the unit, procedural or functional approaches might consider a single function as a unit. But really it's a situational thing - the team decides what makes sense to be a unit for the purposes of their understanding of the system and its testing. Although I start with the notion of the unit being a class, I often take a bunch of closely related classes and treat them as a single unit. Rarely I might take a subset of methods in a class as a unit. However you define it doesn't really matter.


Collaborator Isolation

A more important distinction is whether the unit you're testing should be isolated from its collaborators. Imagine you're testing an order class's price method. The price method needs to invoke some functions on the product and customer classes. If you follow the principle of collaborator isolation you don't want to use the real product or customer classes here, because a fault in the customer class would cause the order class's tests to fail. Instead you use TestDoubles for the collaborators.

But not all unit testers use this isolation. Indeed when xunit testing began in the 90's we made no attempt to isolate unless communicating with the collaborators was awkward (such as a remote credit card verification system). We didn't find it difficult to track down the actual fault, even if it caused neighboring tests to fail. So we felt isolation wasn't an issue in practice.

Indeed this lack of isolation was one of the reasons we were criticized for our use of the term "unit testing". I think that the term "unit testing" is appropriate because these tests are tests of the behavior of a single unit. We write the tests assuming everything other than that unit is working correctly.

As xunit testing became more popular in the 2000's the notion of isolation came back, at least for some people. We saw the rise of Mock Objects and frameworks to support mocking. Two schools of xunit testing developed, which I call the classic and mockist styles. Classic xunit testers don't worry about isolation but mockists do. Today I know and respect xunit testers of both styles (personally I've stayed with classic style).

Even a classic tester like myself uses test doubles when there's an awkward collaboration. They are invaluable to remove non-determinism when talking to remote services. Indeed some classicist xunit testers also argue that any collaboration with external resources, such as a database or filesystem, should use doubles. Partly this is due to non-determinism risk, partly due to speed. While I think this is a useful guideline, I don't treat using doubles for external resources as an absolute rule. If talking to the resource is stable and fast enough for you then there's no reason not to do it in your unit tests.

Recently Jay Fields came up with some useful terminology here, he defines solitary tests as those that use collaborator isolation and sociable test for those that don't. While these are related to the classicist/mockist distinction, in that classic TDDers prefer sociable tests and mockist TDDers prefer solitary tests, there is more to the classicist/mockist difference than their attitude to collaborator isolation.


Speed

The common properties of unit tests — small scope, done by the programmer herself, and fast — mean that they can be run very frequently when programming. Indeed this is one of the key characteristics of SelfTestingCode. In this situation programmers run unit tests after any change to the code. I may run unit tests several times a minute, any time I have code that's worth compiling. I do this because should I accidentally break something, I want to know right away. If I've introduced the defect with my last change it's much easier for me to spot the bug because I don't have far to look.

When you run unit tests so frequently, you may not run all the unit tests. Usually you only need to run those tests that are operating over the part of the code you're currently working on. As usual, you trade off the depth of testing with how long it takes to run the test suite. I'll call this suite the compile suite, since it's what I run whenever I think of compiling - even in an interpreted language like Ruby.

If you are using Continuous Integration you should run a test suite as part of it. It's common for this suite, which I call the commit suite, to include all the unit tests. It may also include a few BroadStackTests. As a programmer you should run this commit suite several times a day, certainly before any shared commit to version control, but also at any other time you have the opportunity - when you take a break, or have to go to a meeting. The faster the commit suite is, the more often you can run it. [2]

Different people have different standards for the speed of unit tests and of their test suites. David Heinemeier Hansson is happy with a compile suite that takes a few seconds and a commit suite that takes a few minutes. Gary Bernhardt finds that unbearably slow, insisting on a compile suite of around 300ms and Dan Bodart doesn't want his commit suite to be more than ten seconds

I don't think there's an absolute answer here. Personally I don't notice a difference between a compile suite that's sub-second or a few seconds. I like Kent Beck's rule of thumb that the commit suite should run in no more than ten minutes. But the real point is that your test suites should run fast enough that you're not discouraged from running them frequently enough. And frequently enough is so that when they detect a bug there's a sufficiently small amount of work to look through that you can find it quickly.

Notes

1: I say "these days" because this is certainly something that has changed due to XP. In the turn-of-the-century debates, XPers were strongly criticized for this as the common view was that programmers should never test their own code. Some shops had specialized unit testers whose entire job would be to write unit tests for code written earlier by developers. The reasons for this included: people having a conceptual blindness to testing their own code, programmers not being good testers, and it was good to have a adversarial relationship between developers and testers. The XPer view was that programmers could learn to be effective testers, at least at the unit level, and that if you involved a separate group the feedback loop that tests gave you would be hopelessly slow. Xunit played an essential role here, it was designed specifically to minimize the friction for programmers writing tests.

2: If you have tests that are useful, but take longer than you want the commit suite to run, then you should build a DeploymentPipeline and put the slower tests in a later stage of the pipeline.

Updated on Oct 24 2014 to include the mention of Field's sociable/solitary vocabulary.

Share: