Last year I read Brian Button’s wonderful article “Double Duty” in Better Software magazine (the February, 2005 issue). One of the things I learned is that Brian is the world’s best namer of unit tests. I visited Brian’s web site for more of his ideas and found an article called “TDD Defeats Programmer’s Block—Film at 11.” In this article, Brian describes using the Test Driven Development process to write a “continuous integration system” (a tool that automatically (re)builds software systems when programmers change the source code). Here are some examples of his unit test names:
Starting Build With No Previous State Only Starts Build For Last Change
Previous Build Number Is Incremented After Successful Started Build
Last Build Failing Leaves Last Build Set To Previous Build
What makes these names so good? I analyzed a few dozen of Brian’s test names and found this pattern: stimulus and result in context. Let’s examine these names to identify the parts.
Starting Build With No Previous State Only Starts Build For Last Change:
Context: There is no previous state (i.e. no previous builds were done).
Stimulus: Start a build.
Result: A build was started for only the last change.
Previous Build Number Is Incremented After Successful Started Build:
Context: There were zero or more previous builds.
Stimulus: Request a build that will succeed.
Result: The build number is one more than before the build.
Last Build Failing Leaves Last Build Set To Previous Build:
Context: There were previous builds, the most recent of which is recorded in the system as the last build.
Stimulus: Request a build that will fail.
Result: The previously identified last build is still identified as the last build.
One of Brian’s tests from a different system—an “animal factory” (a concept better left unexplained)—is called Default Animal Is Cow.
Context: No animal type has been identified as the desired type of animal for the system to manufacture.
Stimulus: Request that the system manufacture an animal.
Result: A new cow exists.
Now that I’ve learned the pattern that makes Brian’s test names so useful, I can use it deliberately. Using the context-stimulus-result scheme increases the value of tests as documentation. The resulting names make clear what specifically is being tested and under what specific conditions. This helps the reader to understand quickly what each test does, and what is covered by each set of tests.
Another benefit is that the context-stimulus-result naming scheme encourages you to clarify your thinking about each test. Each unit test establishes some set of starting conditions, or context. Each stimulates the system. Each compares the result to a desired result. In order to name these elements you will have to think about the specifics of each and clarify them well enough that you can describe each in a few words.
If you’re having difficulty naming a test using this scheme, that may indicate a problem in your test. Perhaps the test is doing too much work, or your test suite is doing too little. For example, suppose you’re testing software to manage bank accounts, and one test is called Withdrawal Test. We can tell from this name that the test tests the withdrawal feature in some way. But we don’t know what specific aspects of withdrawals this test is testing.
Does Withdrawal Test test only that a withdrawal of less than the account balance reduces the balance by the proper amount? If so, calling this test “Withdrawal Test” may indicate that your suite of tests for the withdrawal feature is missing many important test cases. The name of the test gives readers an overly broad sense of what the test actually tests.
Does Withdrawal Test test a score of different stimuli under a dozen different conditions? If so, it’s probably doing too much work. The name of the test does not quickly tell readers what is being tested.
Whether Withdrawal Test is doing too much work or too little, we can improve the test by applying the context-stimulus-result scheme. If Withdrawal Test is doing too much, we can use the scheme to identify how to break the test into smaller, more focused tests with more descriptive names. If Withdrawal Test tests only one tiny aspect of withdrawals and leaves other aspects untested, we can use the scheme to create a better name for the test and to identify other tests to write.
If you want to test class in isolation, but the class works with a collaborator, you may need to provide a fake collaborator for the class to work with. A fake collaborator provides useful isolation in two directions:
It isolates the test from the quirks of the real collaborators. This makes failures more informative: If the test fails, the fault is likely in the test subject, and not in the collaborator.
It isolates the real collaborators from the test. This is important if the real collaborator is, say, the corporate accounts receivable database. You don’t want your tests messing with that.
Fake collaborators often provide other benefits over real collaborators. One benefit is that fake collaborators increase testability by increasing your control over the test subject’s environment. It’s usually easier to set up a fake collaborator to feed your test subject a particular data value than to set up the real collaborator to do the same thing. And if the real collaborator takes a long time to do its work, you can gain control over the speed of the test by writing a fake collaborator that takes essentially no time at all.
Fake collaborators also increase testability in another way: They give you greater visibility into the results produced by the test subject. Sometimes it’s difficult or time consuming to observe what data the test subject delivered to a real collaborator. If you write a fake collaborator, it’s easy to instruct it to remember the data that the test subject delivered. And it’s easy to gain access to that information so that you can compare it to your expectations.
I’ve identified a number of jobs that I often want fake collaborators to do for me when I’m writing tests. Each of these jobs helps me to gain control over the test environment or visibility into the test results.
Fill in an argument to a method call. Suppose the test subject requires me to pass an argument to it—either through the constructor or through the method I’m testing—but the argument is never used during the test. In this case, all I need the “collaborator” to do is to fill in a value in the method call. If that’s all I need, I can pass null.
Accept calls from the test subject. If the test subject calls the collaborator’s methods, but test doesn’t care what the collaborator does, I can write a fake collaborator with dummy methods. If the interface specifies that a method doesn’t need to return anything, I can simply write a dummy method with an empty body. If the method must return a value, I can write the dummy method to return a simple default value, such as 0, null, or false. Objects like this, and similar objects with very simple default behavior, are often called Null Objects.
Provide inputs to the test subject. Sometimes the test subject requires a value other than 0, null, or false in order to run. And sometimes I’m writing a test to determine whether the test subject responds appropriately when it receives specific interesting values from its collaborators. In either case, I enhance the fake collaborator to store an appropriate value and deliver it to the test subject when called.
Record outputs from the test subject. Sometimes I want to know whether the test subject send the right information to the collaborator. I can write the fake collaborator’s methods to store the inputs it receives from the test subject. And I can write accessor methods in the fake collaborator, if necessary, so that the test method can retrieve them.
Verify outputs from the test subject. Sometimes it’s useful to have the collaborator do the verification itself, rather than having the test retrieve values from the collaborator and verify them. When I want this, I can create a mock object, an object that has expectations and can verify them. I can either write my own mock objects, including the verification methods, or I can use one of the numerous mock object libraries that make mocking easier. I use the simple mock features that come with NUnit.
Verify what methods the test subject calls. Sometimes I want to verify not only whether the collaborator received the right values, but also whether the test subject called all of the right methods. And sometimes I want to make sure the test subject does not call certain methods. Mock object libraries typically provide ways to verify function calls.
Verify the sequence in which the test subject calls method. Every now and then, I want to verify that the test subject not only called the right methods on the collaborator, but also called them in a specific order. This can be useful for testing protocols. Some mock libraries provide a way to verify the order of method calls. The NUnit mock library does not. When I need this feature, I often write a logging collaborator that simply writes each expected method call to a string and each actual call to another string. To verify whether the actual calls matched expectations, my test can direct the logging collaborator to compare the two strings.
Collaborate fully. If the test somehow requires the full behavior of a real collaborator, I can use a real collaborator. So far, I haven’t found a need for this when I’m trying to test classes in isolation. I do use real collaborators when my intention is to test the collaboration, and not just one class or another.
I’ve numbered these features in order of lightness. The lighter features are easier to create; the heavier features take more work. null is the lightest collaborator of all, and the real collaborator is the heaviest.
My preference when writing tests is to use the lighest fake collaborator that gives me the visibility and control that I need for the purposes of my test. This keeps my tests as light and flexible as they can be.
Often I start by passing the lightest collaborator of all, null to the test subject, and then wait for the test tell me when I need to add more behavior to the collaborator. If the test subject needs something other than null, I’ll find out when I try to run the test and get a null reference pointer exception. Then I’ll move to a Null Object. If the default values returned from the Null Object don’t satisfy the test subject, the test usually signals that with an exception or failure of some kind, and I’ll move to a heavier collaborator.
I call this approach The Unbearable Lightness of Faking: start with the lightest possible collaborator, and use it until the lightness becomes unbearable and I absolutely must switch to something heavier.
When I’m talking to programmers about writing tests for their own code, one of the questions that comes up often is: Should we test classes in isolation from each other, or in collaboration with each other?
I like both kinds of tests. Here’s why.
I like tests that isolate classes. When a failure occurs, the tests tell me specifically what class failed, and what method failed. That guides me more directly to the fault—the specific code that is broken—and saves a ton of debugging.
I like tests that exercise collaborations. When a failure occurs, the tests tell me that:
one class or the other is not fulfilling its responsibilities, or
the collaborators disagree about each other’s responsibilities, or
some other class (the “electrician” class that connects the collaborators with each other) has wired the collaborators together improperly.
If the individual classes are well tested, I can focus my collaboration testing specifically on wiring and agreements. And if the individual classes are tested well, collaboration test failures tell me about disagreements and improper wiring.
When I test classes in isolation, failures guide me quickly to faults.
When I test classes in collaboration, failures tell me where the classes disagree about each other’s responsibilities.