
by Martin Fowler: column for Distributed Computing November/December 1997
I was chatting with a client about an object model review they wanted to me to do. “We can send some documentation in advance, would that be useful?” they asked. I replied in the affirmative, hoping that I was not lying. Two days later the UPS man dropped the package off outside my door, it made a loud noise. It was a good inch and a half of documentation.
I opened it up and found a print-out provided from a CASE tool. It showed a few diagrams, and gave exhaustive descriptions of every class, with every attribute, and every operation. All of these had definitions. The Contract class was defined as “a contract between many parties”, its dateSigned attribute was defined as “the date the contract was signed”. I read through the inch and a half of documentation, but at the end I was little wiser. There was much on what the objects were, but little explanation of what they were meant to do. It wasn’t the first time this had happened, and I’ll be surprised if it is the last.
Why do we bother with models or documentation? They don’t execute, and our customers pay us for working code, not pretty pictures. We bother with models to communicate. The idea is that a graphical object model can show how objects fit together more clearly than looking at the source, an interaction diagram can show a collaboration better than figuring out the call path from several class definitions. But so often the design documentation fails in this, and leaves me puzzled on my sofa.
Part of the problem is the CASE tools that people use for this kind of work. (CASE tools have two purposes, documentation and code generation, and I’m only talking about the former role here.) CASE tools encourage a dictionary mentality. You make an entry for every class, you show every class and every attribute on the diagrams, you draw an interaction diagram for every use case. They encourage completism by helping you answer the question “have we documented everything?”
But that question is the wrong question. If you document everything, you are giving everything an equal weight. Do that for a complex system, and you are buried in detail. In any system there are some aspects that are more important than the others, key aspects of the system that once understood, will help someone to learn more. The art in documentation is to find how to document these aspects as clearly as possible. In this you emphasize these areas, and leave the details for the code.
Above all this documentation must be brief. Only if it is brief will people read it and understand it. Only if it is brief will you bother to keep it up to date. You won’t be able to talk about everything, and nor should you. A friend of mine told me about one project where they were reluctant to change class names, not because the code took too long to change, but the documentation took too long to change. When documentation becomes a problem you should deal with it. Throw at least half of it away.
If your system is of any reasonable size, divide your system into packages (a la UML or Java). Each package consists of a group of classes that work together for a particular purpose. Document the overall structure of your system with a diagram that shows packages and their dependencies. (In UML this is a specific use of a class diagram, I use it so often that I like to name it a package diagram, see my book UML Distilled.) Work with your design to minimize these dependencies, this is the key to minimizing the coupling in your system. (There’s not much to read on how to do this, the best one I know is Robert Martin’s Designing Object-Oriented C++ Applications Using the Booch Method.)
For each package, write a brief document. The basis of the document is some narrative text that describes the key things the package does, and how it does it. UML diagrams can be used to help support this. Draw a class diagram that shows the important classes in the package but not necessarily all of them. For each class show only the key attributes and operations, definitely don’t show all of them. Concentrate on interface rather than implementation. For each important collaboration in the package, show an interaction diagram. If any class has interesting lifecycle behavior, then show it with a state diagram. The document should be small enough that you don’t find it a problem keeping it up to date. I usually try to keep it to no more than a dozen pages.
As well as documentation per package, it is also useful to show how collaborations extend across packages. For this identify the key use cases in the system, and document them with interaction diagrams and narrative. A class diagram that highlights the key classes involved is also useful. Many people advocate drawing interaction diagrams for every use case in the system. I feel this can lead to too much documentation, but if you find it useful, and you find it isn’t a problem to keep it up to date, then go ahead and do it. Even so you should identify no more than a dozen key use-cases to highlight as the ones that everyone needs to understand.
As I started to write this article I was overwhelmed by the things I
could talk about. Lots of anecdotes and tips came to mind. But I know
that to get you to read and remember this article I could only talk
about a few of them. I had to select the key things that I had to
mention. Communication is all about that. The key to good communication
is to highlight the important things to say. Saying everything is not
communication. That just passes the selection of the important things
onto your readers, and discourages them with a heavy document. That
selection of information is one of the most important parts of
communication, and it is the responsibility of every designer.