In a thoughful comment on my blog post about writing maintainable automated acceptance tests, Chris Falter suggested a different way to name the variables in my test cases. He mentions that our two naming styles present a tradeoff, and that set me on a long trail of thought.
I’m fascinated by tradeoffs, and I often drive myself nuts making them — I go back and forth and back and forth and… And at some point, I’ll identify the qualities that I’m trying to trade off (expressiveness, test speed, number of places I’d have to change if the code or the requirements change) and move on. Until the next time I visit that file. Then I’ll go back and forth and back and forth… I am very good at revisiting decisions, and not so good at sticking with a decision I made in the past — even minutes in the past.
Chris’s suggestion points out that there are two pieces of information we’re trying to encode into the name: The idea that passwords have a minimum and maximum valid length, and the specific minimum and maximum (6 and 16 characters). I went with one of those pieces of information, Chris went with the other.
As Chris points out, each style leaves readers to infer something important. With Chris’s style, readers must infer what’s special about any given length. With my style, readers must infer what specific lengths form the boundaries. Neither style expresses both pieces of information explicitly — e.g. that the maximum legal length is 16 characters. There’s a tradeoff here: Which piece of information to express? And by making that tradeoff differently, each style not only expresses one piece of information, but also emphasizes it. My style emphasizes the idea of minimum and maximum lengths; Chris’s style emphasizes the specific lengths themselves.
Also, Chris points out (and I agree) that my style requires readers to count string lengths, a tedious, error-prone chore.
Given all of that: Which style do you prefer? More importantly: What do you prefer about it?
Sometimes I can easily see how to trade off possibilities. Other times I can’t see a clear winner. For those times, I recommend experimenting. Try each possibility. Then pay attention to what happens.
In this case, there’s another criterion I can apply: With Chris’s tests, the length of each password appears three times: Once implicitly in the password itself, once in the declaration of the variable, and once in the test that references each variable. Expressing that specific datum three times is potentially troublesome: If we increase the maximum length of a password to 20, we’d have to change six places (three places for a max length password, three places for a password that’s too long). With my style, we’d have to change only two places: the passwords themselves. The variable names would remain the same.
Though I’m not entirely sure which style emphasizes the more important bit of information, the criterion of “how many places I’d have to change” leaves me preferring the style I used in the article. Chris might still reasonably prefer his style, if the extra expressiveness he perceives is valuable enough to outweigh the extra cost of change.
So far, I still prefer my original variable names to Chris’s. And yet his suggestion, his thoughtful explanation of why he prefers it, and especially the contrast it provides with my original tests, make me wonder: Now that we know what we’re trading off, can we find a way to eliminate the tradeoff altogether? Is there another style that allows us to express all of the information we want to express, and without increasing the cost of change?
Uncle Bob, in his video, offers a third style: Instead of conveying information through variable names, express it through comments. His comments express something similar to my variable names: maximumness and minimumness. It would be easy enough to add the information that Chris’s variable names express and that mine lack: “16 characters is just short enough” and “6 characters is just long enough”. I’ve unfortunately trained myself to feel queasy whenever I start to type a comment into code. I’m going to have to get over that. Writing a comment is not necessarily evil; it’s just a tradeoff.
Contrasting my tests to Uncle Bob’s, I notice yet another tradeoff: How to organize tests into suites? I organized my tests around specific validity criteria: One set of tests for character content requirements, another for length requirements. Uncle Bob organized the same tests differently: One set of tests for valid passwords, another for invalid passwords. And each way of organizing requires us to name our groupings, which offers an opportunity to subtly highlight one piece of information or the other. My organization emphasizes that are two classes of validity criteria, content and length. Uncle Bob’s emphasizes that passwords may be valid or invalid.
Which emphasis do you prefer? More importantly: What do you prefer about it?
A few final points about tradeoffs. If you want to get better at making tradeoffs such as these, step one is to notice what tradeoffs you’re making. And a great way to do that is to pair with someone. I wrote the tests on my own, and in the article I mentioned the tradeoffs I was aware of making. But I made other tradeoffs implicitly, without noticing I was making them. It was only when Chris and Bob offered alternatives that I noticed I was making those tradeoff at all.
Thanks Chris and Bob for inviting me to explore the tradeoffs I make, and how I make them!
Because the concept of system responsibility is so foundational to how I develop and test software, I want to expand on my earlier description. Recall that I defined a system responsibility as a system’s obligation to respond to each notification of a specified kind of event under specified circumstances by producing a specified set of planned results.
A system responsibility includes three parts:
A stimulus that triggers the system to respond to an event.
A context in which the system is required to respond to the stimulus.
A set of results that the system is obligated to realize in response to that stimulus in that context.
Stimulus.A stimulus is a message, sent by someone or something outside the boundary of the system, that informs the system of an event to which it is obligated to respond. The stimulus has a name, which may identify either the event that it represents or the planned response that the system must carry out. The stimulus may include additional information about the event.
Stimuli are delivered to a system through its interfaces. An interface defines a set of messages to which a system responds, and the mechanisms by which those messages are delivered. For GUI systems, the interface includes a suite of windows, forms, buttons, text fields, and other mechanisms that translate user gestures (mouse clicks, key presses) into messages. Web-based systems receive stimuli through HTTP requests and other interfaces. Smaller scale systems, such as objects inside a software application, expose Application Programming Interfaces (APIs) that define the set of methods to which internal objects and subsystems respond.
Result.A result is an effect that the system realizes in response to a specified stimulus in a specified context. A result may be either a message delivered to someone or something outside the boundary of the system or a change in the system’s internal state.
GUI systems deliver messages through forms, windows, screens, audio devices, and other output devices. Web-based systems deliver messages through HTTP responses and requests. An application’s internal objects and subsystems deliver messages through method calls and method return values.
In addition to delivering messages to external entities, systems also respond to events by recording information internally, and by making changes to that internal information. The information may be stored inside the running application, in a database, in files on the computer’s file system, or other storage mechanisms. The information that a system stores in order to guide its responses to future events makes up the system state.
Context. Sometimes a system’s planned response depends not just on information delivered through the stimulus, but other information as well. The context for a given responsibility is all of the information other than that delivered in the stimulus that influences the results that the system is obligated to realize in response to an event. The context may include information about the state of the system itself–that is, information that the system previously recorded in its internal memory about prior events. The context may also include information that the system can observe across its boundary–information that the system must request from external entities in order to fulfill the responsibility.
Comments
I first learned about the idea of planned response systems from III, a colleague and friend of mine. I later read about the idea in depth in McMenamin and Palmer’s profound book Essential Systems Analysis.
The idea of planned response systems is fundamental to how I think about programming and testing. I’m posting my thoughts here so that I can refer to these terms and ideas in later blog posts. Until I write those posts, I encourage you to notice what happens when you think about software systems as planned response systems.
A planned response system is a system that responds in planned ways to events in its environment.
For example, a software system is a planned response system—it responds in planned ways to users’ actions.
In an object-oriented software systems, each object is a planned response system—it responds in planned ways to messages sent by other objects.
Planned response systems produce two general kinds of results: They send messages to entities outside of the system boundary, and they make changes to the essential memory of the system.
An event is a significant change in the system’s environment. A change is significant to the system if the system is obligated to respond to the change in a planned way.
Events fall into two broad categories: Changes initiated by entities in the system’s environment (e.g. users or other systems), and temporal events caused by the passage of of time.
For example, an ATM is obligated to respond in a planned way to a user’s request to withdraw cash. The user’s request is an event.
A system responsibility is a system’s obligation to respond to each notification of a specified kind of event under specified circumstances by producing a specified set of planned results.
The specification of a system responsibility consists of three parts: A specification of a kind of event, a specification of a set of circumstances, and a specification of the set of planned results that the system is obligated to produce in response to being notified of an event of that kind under those circumstances.
A system becomes obligated to respond to an event when a system designer allocates that responsibility to the system.
The essence of a planned response system is the set of responsibilities allocated to the system, independent of the choice of technology used to implement the system.
The definition a system’s essence makes no mention whatever of technology inside the system, because the system’s essential responsibilities would be the same whether it were implemented using software, magical fairies, a horde of trained monkeys, or my brothers Glenn and Gregg wielding pencils and stacks of index cards.
One way to identify the essence of a system is to indulge in The Fantasy of Perfect Technology. Imagine a system implemented using perfect technology. Then ask yourself some questions about the quality attributes of the system.
How fast would it respond? If it were made of perfect technology, of course it would respond instantly, with zero delay. How many users could use it at once? An infinite number of users. How much information could it store? An infinite amount. How often would it break? It would never break. How long does it take to start up? None, because it’s always on and always available. How much energy would it use? It would use no energy; heck, it might even generate energy for free.
The one glaring flaw of perfect technology is that it does not exist. Real-world technology is imperfect. That’s what makes this exercise a fantasy. But it’s a useful fantasy, because it helps us to separate the system’s essential responsibilities from the temporary constraints of current technology.
Note that we apply the Fantasy of Perfect Technology only inside the boundary of the system. Even in our fantasy, the world outside of the system is made of real, imperfect stuff, with which the system will have to interact.
Now apply the fantasy to your own system. What responsibilities would your system have even if you could implement it using perfect technology? That set of responsibilities is your system’s essence.
The essential memory of a system is the set of data that the system must remember in order to fulfill its obligations—that is, in order to respond as planned to future events.
For example, an ATM must remember users’ account balances in order to determine whether to satisfy users’ requests to withdraw money.
A highly visible part of TDD is that developers write more tests than they used to. It is tempting to conclude from this that programming takes longer with TDD than without. After all, we are writing tests that we didn’t write before, and this is additional work on top of all the work we used to do. Right?
That’s a perfectly reasonable conclusion. But it is based on a flawed assumption, an accounting error that leads us to overlook some important costs. The assumption is that in addition to the new work of writing tests, developers will continue to do all of the same work they were doing before.
This assumption turns out to be unwarranted in practice. For example, before TDD, developers typically spend a great deal of time debugging. Why are they debugging? Because they have identified a failure, but they don’t know what specific piece of code is broken that produces the failure. They step through the code in the debugger in order to trace the failure to the fault. Only when they identify the broken code can they fix it.
TDD reduces the amount of debugging in two ways. First, developers write fewer defects. Second, when a developer introduces a defect, the code immediately fails one or more tests. These tests point directly toward the broken code. If the tests tell me what specific code is broken, I don’t need to run the debugger to find the fault.
Also, if developers write fewer defects, they now spend less time fixing defects.
If we add up all of the effects of TDD, including the cost of writing tests and the cost savings we gain by writing tests, we find that the first few features cost slightly more than without TDD, but that the quality is noticeably higher. And we find that later features take less time than without TDD, because TDD keeps code changeable.
An additional wrinkle leads to another accounting error: not everyone sees debugging as a cost. In my pre-TDD days, I usually enjoyed debugging. I loved having a meaty puzzle to solve. It was often the most fun and interesting and engaging part of programming. And in one job, my managers (falling prey to yet another accounting error) rewarded fixing bugs far more visibly and vocally than preventing them.
If you’re a manager, learn to attend not just to the obvious costs and benefits of a given set of programming practices, but also the costs and benefits that require a little effort and insight to detect.
If you’re a developer, I can tell you that I don’t miss debugging. The puzzle of how to choose the next test that will drive my code toward completion is as fun and interesting and challenging as debugging ever was. I hope it’s the same for you.
I very recently started to write my first application based on the eclipse Rich Client Platform. RCP is a wicked neato platform that automagically implements all kinds of UI features that you would otherwise have to either program manually or leave out of your app. RCP FTW!
My app is dalewriter, a tool to help me write and organize stories, articles, and books. (I’m a geek, which means that instead of writing stories, articles, and books, I spend my time creating tools to help me write stories articles and books.) dalewriter will allow me to edit text, store it a tree form (chunks of text in a tree of folders), shuffle and reorder the bits, and build a manuscript from the tree or a subtree.
Six days ago I didn’t know nothin’ ’bout no RCP, so I’ve been using McAffer and Lemieux’s RCP book, following along with their running example and adjusting what I learned to fit my app instead of theirs.
Development was going swimmingly until I tried to establish my first key binding–the connection that makes a given keystroke cause a given action. I wanted CTRL-N to create a new text item in the currently selected folder and open it for editing. I’d already written and tested the action itself, and now I wanted to connect CTRL-N to the action.
THE PROBLEM
I followed the example in the book, and it didn’t work right. I wish I could tell you the exact failure, but I wasn’t expecting that I would (have to) learn so danged much that I’d want to write about it. I spent a great deal of the next few days (and long, long nights) tracking down everything I could find on the Interwebs about key bindings.
Part of the problem is that eclipse’s key-binding mechanism has changed over the past few releases, and much of the information I found referred to old-style bindings. I tried each idea I found. Each one led to “hey, dood, cut that out, that’s deprecated” warnings. (I’m paraphrasing slightly.)
And not only that, they didn’t work. I observed three general “not what I wanted it to do” responses from my app when I typed CTRL-N:
Ding at me (the standard Windows warning ding) without creating a new item. I came to interpret this as a sign that CTRL-N was not bound to any action.
Open the “Select a wizard” dialog, a standard eclipse dialog that allows the user to select among a list of things to create. This appears to be the default eclipse action for CTRL-N.
Open a little yellow table in the bottom right corner of the window, asking me to select which action I wanted to take. The choices were labeled “New” and “New Item”. The “New” choice invoked the “Select a wizard dialog”. The “New Item” choice was the one I had created. Hey! That’s progress! It still wasn’t right–I wanted CTRL-N to directly invoke my action instead of first asking the user (i.e. me) to choose, but still, that was progress.
Finally, I somehow pieced together disparate examples and descriptions of bindings and related mechanisms, and found a solution.
Given how many other people have stumbled over eclipse/RCP key-binding quirks, I thought I’d describe the solution I found, my understanding of why this solution works, and ideas for what to do if you’re having trouble. You’re welcome.
THE SOLUTION
Before I describe what worked, A CAVEAT: I still don’t know hardly nothin’ ’bout no RCP. As I said, I’ve been working with RCP for less than six days, so take everything I say here with a bag of salt. I’ve done my best to explain my understanding of how and why this works, but I’m no authority. Yet.
The key ingredients of the whole enchilada are:
An action to invoke.
A command that invokes the action.
A key binding that invokes the command (that invokes the action).
A plug-in configuration file that enables your key bindings.
Action. First, create an action that the keystroke will invoke. I’ll assume that you know how to create an Action in RCP. If you need help with this, see the RCP book.
The secret sauce here is that you have to make the action available for invocation by commands. To do that:
Make up a unique identifier string that will represent the command. I chose com.dhemery.dalewriter.command.AddItem.
In the constructor of your action, call the setActionDefinitionId() method, passing it the identifier you created. When the command is invoked, RCP will look for a registered action that has this action definition ID, and invoke it.
In ApplicationActionBarAdvisor.makeActions(), register your action in the usual way (see the RCP book).
Command. There is no way to connect a key binding directly to an action. Apparently there used to be, but the old mechanism either gone or deprecated. To connect a key binding to an action, you create an intermediary called a command. You do this not with Java code, but by declaring a command extension:
Add the org.eclipse.ui.commands extension point to your plugin.xml file (on the Extensions tab of your plugin.xml file).
Add a new command: Right click on the commands extension point and select New > command.
In the id* field of the new command, enter the command identifier that you used to register your action (e.g. com.dhemery.dalewriter.command.AddItem). This connects the command to the action. When the command is invoked–such as in response to a user keystroke–RCP will invoke the action that you have registered with this ID. The id* that you enter into this field must be identical to the one with which you registered the action.
Use whatever name* and description you like. Those fields don’t affect key bindings.
Key Binding. Next, you need a key binding and a key binding scheme. The key binding connects a specified keystroke to your command. A key binding scheme is a named bundle of key bindings that can be enabled all at once.
Add the org.eclipse.ui.bindings extension point to your plugin.xml file.
Create a key binding scheme: Right click on the bindings extension point and select New > scheme.
Give your scheme a unique identifier in the id* field. I used com.dhemery.dalewriter.bindings.
Use whatever name* and description you like. These fields don’t affect key bindings.
You can leave the parentId field blank. Or if you want to make a ten-pound bag of default actions available to your app, see the “Bonus: Default Bindings and Actions” section below.
Create a key binding: Right click on the bindings extension point and select New > key.
In the sequence* field, enter the key (or key chord) that will invoke your action (e.g. M1+N or CTRL-N).
In the schemeId* field, enter the identifier you assigned to your new binding scheme. This tells RCP to active this binding whenever the scheme is active.
In the commandId field, enter the command identifier that you made up way back in step one. This is the command id that you assigned to the command extension, and that you used to register the action. For me, the commandid was com.dhemery.dalewriter.command.AddItem.
For the love of all that is good and holy, don’t mess with contextId, platform, or locale. If you’re indifferent to all that is good and holy, make up your own reason not to mess with those fields.
Plug-In Configuration File. To activate your binding scheme (and thus all of its bindings), you use a property in a plug-in configuration file:
Create a new plug-in configuration file in your project. This is just a plain text file. I added mine directly under the project and called it plugin_customization.ini (which seems to be the standard name).
Add a KEY_CONFIGURATION_ID property in the file to tell RCP to activate your binding scheme whenever your plug-in is active. To do that, add a line like this:
Add a preferenceCustomization property to your product extension. This tells RCP to load your configuration file and apply its preferences when it activates your plug-in.
Find your product extension under the org.eclipse.core.runtime.products extension point in the plugin.xml file. If your application doesn’t yet have a product extension, see the RCP book for instructions on how to create one.
Right click your product extension and select New > property.
In the name* field, enter preferenceCustomization. When RCP starts up your application, it will look up this property to identify the plug-in configuration file to apply.
In the value* field, enter the name (and path, if appropriate) of your plug-in configuration file.
Yep, it’s as simple as that. (Yep, that was irony–a subtle form of we Mainers call hyoomah.)
If you know what you’re doing, you can define the command, the binding scheme, the key binding, and the customization property directly in raw XML in the plugin.xml file. If you don’t know what you’re doing, your plugin.xml file will become contumaciously hosed (hosed being a technical term that means (roughly): hosed). I will leave it to the reader to infer how I know this.
Before I move on to troubleshooting, here’s one more bit of useful information.
Bonus: Default Bindings and Actions. eclipse/RCP includes a default key binding scheme the defines key bindings for a passel of groovy built-in actions (e.g. CTRL-S to save). Now, it turns out that the default scheme is the bastard that eructs that annoying “Select a wizard” dialog when I press CTRL-N in dalewriter. Mumble grumble.
But wait, here’s the wickedest awesomest part: There’s a way for your key binding scheme to inherit all of the cool defaults, use the ones you want, override the ones you want to override, and defenestrate the ones you don’t want. Here’s the straight dope:
To make the default bindings and actions available in your own binding scheme, make the default scheme the parent of yours: In the parentId field of your scheme, enter org.eclipse.ui.defaultAcceleratorConfiguration.
If you want a keystroke to trigger an action of your own rather than the default action, simply add a key binding of your own into your scheme. Yeah, okay, so that’s not so simple: You still have to do all of the crap in the previous sections. But still.
If you want to disconnect a default binding from your app without overriding it, simply add a binding to your scheme, enter the appropriate key chord in the sequence* field, and leave the commandId field empty. See? This time when I said simply, I really meant simply. Leaving the commandId field blank tells RCP “when the user presses these keys, do nothing.”
TROUBLESHOOTING RCP KEY BINDING PROBLEMS
Here are the primary failures I’ve observed, and possible solutions.
The “Dreaded Ding” Problem: Your app emits a warning ding instead of executing your action. This is a sign that your app has no active key binding for that keystroke. In order for a keystroke to find the right action, there’s a long chain of links that must be established correctly.
Solution. Here are some things to check:
Did you make up a command ID that is absolutely unique within your app? I don’t know what happens if there are duplicate command IDs. I suspect it would destroy Western civilization.
Did you call setActionDefinitionId() in your action’s constructor? If not, your action is not available to be invoked by commands.
Did you pass the correct command ID to setActionDefinitionId()? The string that you pass must be identical to the string you entered into your command extension’s id* field. If you passed an incorrect string, RCP won’t find your action (and therefore won’t execute it) when the command fires.
Did you register your action by calling register(action) within ApplicationActionBarAdvisor.makeActions()? If you omit this call, your action won’t be registered with RCP, so RCP won’t be able to find and execute your action when the command fires.
Did you define a command extension? If you omit this, RCP will not know to associate your key binding with your action.
Did you assign your command the correct id*? If the id* is incorrect, RCP will not find your action when the command fires.
Did you define a binding scheme for your key bindings? If you omit binding scheme, RCP will not know to activate your binding.
Did you assign the binding scheme a unique id*? I don’t know what happens if there are duplicate binding scheme IDs. Again, Western civilization hangs in the balance, and even Obama won’t be able to fix this.
Did you define a key binding? If you omit the key binding, RCP won’t know what to do when the user types the keystroke. (Could I say “when the user strokes the keys,” or would that just be weird?)
Did you specify the correct key sequence* in the key binding? If the sequence* is incorrect, your action may fire when the user strokes some other keys. (Hmmm. Yes, I guess that does sound weird.)
Did you assign your key binding to your binding scheme? If you assign the key binding to an incorrect binding schemes, then even if your binding scheme is active, your key binding will not be.
Did you create a plug-in customization file? If you omit the customization file, RCP will not know that you have a binding scheme (or other preferences) for it to load when it loads your plug-in.
Did you include your plug-in customization file in the build? If not, RCP won’t find your customizations when it looks for them.
Did you add a preferenceCustomization property to your product? If you omit this property, RCP will not know about your preferences file, and so will not load your preferences.
Did you assign the preferenceCustomization the right value*? This value* be the name your plug-in customization file, along with the file path if you put the file somewhere other than directly the project. If you assign an incorrect value*, RCP will load (or attempt to load) the wrong set of preferences when it loads your plug-in.
Did you add the org.eclipse.ui/KEY_CONFIGURATION_ID property to your plug-in customization file? If you omit this property, then won’t know that you want it to activate your binding scheme.
Did you assign the KEY_CONFIGURATION_ID property the right value? This must be the unique identifier for your binding scheme. If value of this property is incorrect, RCP will activate (or try to active) a binding scheme other than the one you intend.
The “Multiple Choice” Problem: Your app displays a little yellow table (like the picture below) that displays a number of actions that you can choose to execute. This is a sign that the active binding scheme has two or more bindings for the same keystroke. I stumbled over this when (in my willingness to try anything) I assigned my key binding to eclipse/RCP’s default binding scheme (org.eclipse.ui.defaultAcceleratorConfiguration). This made all of the cool default bindings available to my app, but it created a conflict between my CTRL-N binding and the “Select a wizard” CTRL-N binding already in the default scheme. When you execute a keystroke for which the active scheme has two bindings, eclipse/RCP requires the user to take the extra step of choosing between the several possible actions:.
Solution. Assign your key binding to your own binding scheme, and not to the default one. If you also want your app to inherit all of the other key bindings from the default scheme, follow the instructions in the “Bonus: Default Bindings and Actions” section above.
The “Wrong Action” Problem: Your app executes some other action (e.g. the default “Select a wizard” dialog) instead of your action. This primary trouble here is the same as in The “Dreaded Ding” Problem: Your key binding is not active. The difference here is that some other scheme has activated a key binding for the relevant keystroke. Your key binding is not active, but some other key binding is active.
Solution. First activate your key binding by working through the checklist for The “Dreaded Ding” Problem. If this causes you to lose all of the useful actions from the default scheme, make sure to declare the default scheme as a parent of your scheme, as described in the “Bonus: Default Bindings and Actions” section above.
EXPLORING ON YOUR OWN
If none of my ideas solve the problem for you, well, that’s all the ideas I have right now, so you’ll have to gather more information either from your program or from the innertoobs.
To help you explore your program, I’ll toss you one last cookie. Here’s a programmatic way to print the active binding scheme and a list of all active key bindings:
A code coverage tool watches your program executing and reports which lines of code were executed and which were not. Testers are sometimes tempted to use code coverage tools to assess test coverage. And some testers are tempted to set code coverage goals. If you feel these temptations, be careful how you interpret the code coverage tool’s reports.
You can be sure that if a line of code was not executed during a test run, then it certainly was not tested by that run.
But what of a line of code that was executed by the tests? Unfortunately, you can’t tell, just from the fact that it was executed, whether the line was tested.
Elisabeth Hendrickson and I developed a workshop on unit testing. The work of the workshop centered on a small application we had written, a rudimentary HTTP server. Our initial code had exactly thirteen tests, just enough to illustrate a few basic tools and techniques that we’d be teaching in the workshop.
When we ran a test coverage tool called NCover to watch our test suite, it reported that our thirteen tests executed 65 percent of the server’s code. Does that mean that we achieved 65 percent test coverage? Not on your life. Our thirteen tests barely scratched the surface of the responsibilities of even our very simple HTTP server.
If our tests tested so little, why was code coverage so high? Because though we our suite tested little of the code, it executed a lot of the code.
For example, one of our tests sent a GET request to the server and evaluated the response. As the server executed the request, it called a logging function to log information about the request and its response to a file. The logging function was minimal, and did not deal with any of the zillions of possible file system errors it might encounter. It expected the happy path, and nothing but the happy path. So this one test, which did not in any way assess the logging feature, executed all of the logging code. The logging code was 100 percent executed and zero percent tested.
Code coverage does not imply test coverage. If you use code coverage tools to help assess your test coverage, keep that in mind.
Last year I read Brian Button’s wonderful article “Double Duty” in Better Software magazine (the February, 2005 issue). One of the things I learned is that Brian is the world’s best namer of unit tests. I visited Brian’s web site for more of his ideas and found an article called “TDD Defeats Programmer’s Block—Film at 11.” In this article, Brian describes using the Test Driven Development process to write a “continuous integration system” (a tool that automatically (re)builds software systems when programmers change the source code). Here are some examples of his unit test names:
Starting Build With No Previous State Only Starts Build For Last Change
Previous Build Number Is Incremented After Successful Started Build
Last Build Failing Leaves Last Build Set To Previous Build
What makes these names so good? I analyzed a few dozen of Brian’s test names and found this pattern: stimulus and result in context. Let’s examine these names to identify the parts.
Starting Build With No Previous State Only Starts Build For Last Change:
Context: There is no previous state (i.e. no previous builds were done).
Stimulus: Start a build.
Result: A build was started for only the last change.
Previous Build Number Is Incremented After Successful Started Build:
Context: There were zero or more previous builds.
Stimulus: Request a build that will succeed.
Result: The build number is one more than before the build.
Last Build Failing Leaves Last Build Set To Previous Build:
Context: There were previous builds, the most recent of which is recorded in the system as the last build.
Stimulus: Request a build that will fail.
Result: The previously identified last build is still identified as the last build.
One of Brian’s tests from a different system—an “animal factory” (a concept better left unexplained)—is called Default Animal Is Cow.
Context: No animal type has been identified as the desired type of animal for the system to manufacture.
Stimulus: Request that the system manufacture an animal.
Result: A new cow exists.
Now that I’ve learned the pattern that makes Brian’s test names so useful, I can use it deliberately. Using the context-stimulus-result scheme increases the value of tests as documentation. The resulting names make clear what specifically is being tested and under what specific conditions. This helps the reader to understand quickly what each test does, and what is covered by each set of tests.
Another benefit is that the context-stimulus-result naming scheme encourages you to clarify your thinking about each test. Each unit test establishes some set of starting conditions, or context. Each stimulates the system. Each compares the result to a desired result. In order to name these elements you will have to think about the specifics of each and clarify them well enough that you can describe each in a few words.
If you’re having difficulty naming a test using this scheme, that may indicate a problem in your test. Perhaps the test is doing too much work, or your test suite is doing too little. For example, suppose you’re testing software to manage bank accounts, and one test is called Withdrawal Test. We can tell from this name that the test tests the withdrawal feature in some way. But we don’t know what specific aspects of withdrawals this test is testing.
Does Withdrawal Test test only that a withdrawal of less than the account balance reduces the balance by the proper amount? If so, calling this test “Withdrawal Test” may indicate that your suite of tests for the withdrawal feature is missing many important test cases. The name of the test gives readers an overly broad sense of what the test actually tests.
Does Withdrawal Test test a score of different stimuli under a dozen different conditions? If so, it’s probably doing too much work. The name of the test does not quickly tell readers what is being tested.
Whether Withdrawal Test is doing too much work or too little, we can improve the test by applying the context-stimulus-result scheme. If Withdrawal Test is doing too much, we can use the scheme to identify how to break the test into smaller, more focused tests with more descriptive names. If Withdrawal Test tests only one tiny aspect of withdrawals and leaves other aspects untested, we can use the scheme to create a better name for the test and to identify other tests to write.
If you want to test class in isolation, but the class works with a collaborator, you may need to provide a fake collaborator for the class to work with. A fake collaborator provides useful isolation in two directions:
It isolates the test from the quirks of the real collaborators. This makes failures more informative: If the test fails, the fault is likely in the test subject, and not in the collaborator.
It isolates the real collaborators from the test. This is important if the real collaborator is, say, the corporate accounts receivable database. You don’t want your tests messing with that.
Fake collaborators often provide other benefits over real collaborators. One benefit is that fake collaborators increase testability by increasing your control over the test subject’s environment. It’s usually easier to set up a fake collaborator to feed your test subject a particular data value than to set up the real collaborator to do the same thing. And if the real collaborator takes a long time to do its work, you can gain control over the speed of the test by writing a fake collaborator that takes essentially no time at all.
Fake collaborators also increase testability in another way: They give you greater visibility into the results produced by the test subject. Sometimes it’s difficult or time consuming to observe what data the test subject delivered to a real collaborator. If you write a fake collaborator, it’s easy to instruct it to remember the data that the test subject delivered. And it’s easy to gain access to that information so that you can compare it to your expectations.
I’ve identified a number of jobs that I often want fake collaborators to do for me when I’m writing tests. Each of these jobs helps me to gain control over the test environment or visibility into the test results.
Fill in an argument to a method call. Suppose the test subject requires me to pass an argument to it—either through the constructor or through the method I’m testing—but the argument is never used during the test. In this case, all I need the “collaborator” to do is to fill in a value in the method call. If that’s all I need, I can pass null.
Accept calls from the test subject. If the test subject calls the collaborator’s methods, but test doesn’t care what the collaborator does, I can write a fake collaborator with dummy methods. If the interface specifies that a method doesn’t need to return anything, I can simply write a dummy method with an empty body. If the method must return a value, I can write the dummy method to return a simple default value, such as 0, null, or false. Objects like this, and similar objects with very simple default behavior, are often called Null Objects.
Provide inputs to the test subject. Sometimes the test subject requires a value other than 0, null, or false in order to run. And sometimes I’m writing a test to determine whether the test subject responds appropriately when it receives specific interesting values from its collaborators. In either case, I enhance the fake collaborator to store an appropriate value and deliver it to the test subject when called.
Record outputs from the test subject. Sometimes I want to know whether the test subject send the right information to the collaborator. I can write the fake collaborator’s methods to store the inputs it receives from the test subject. And I can write accessor methods in the fake collaborator, if necessary, so that the test method can retrieve them.
Verify outputs from the test subject. Sometimes it’s useful to have the collaborator do the verification itself, rather than having the test retrieve values from the collaborator and verify them. When I want this, I can create a mock object, an object that has expectations and can verify them. I can either write my own mock objects, including the verification methods, or I can use one of the numerous mock object libraries that make mocking easier. I use the simple mock features that come with NUnit.
Verify what methods the test subject calls. Sometimes I want to verify not only whether the collaborator received the right values, but also whether the test subject called all of the right methods. And sometimes I want to make sure the test subject does not call certain methods. Mock object libraries typically provide ways to verify function calls.
Verify the sequence in which the test subject calls method. Every now and then, I want to verify that the test subject not only called the right methods on the collaborator, but also called them in a specific order. This can be useful for testing protocols. Some mock libraries provide a way to verify the order of method calls. The NUnit mock library does not. When I need this feature, I often write a logging collaborator that simply writes each expected method call to a string and each actual call to another string. To verify whether the actual calls matched expectations, my test can direct the logging collaborator to compare the two strings.
Collaborate fully. If the test somehow requires the full behavior of a real collaborator, I can use a real collaborator. So far, I haven’t found a need for this when I’m trying to test classes in isolation. I do use real collaborators when my intention is to test the collaboration, and not just one class or another.
I’ve numbered these features in order of lightness. The lighter features are easier to create; the heavier features take more work. null is the lightest collaborator of all, and the real collaborator is the heaviest.
My preference when writing tests is to use the lighest fake collaborator that gives me the visibility and control that I need for the purposes of my test. This keeps my tests as light and flexible as they can be.
Often I start by passing the lightest collaborator of all, null to the test subject, and then wait for the test tell me when I need to add more behavior to the collaborator. If the test subject needs something other than null, I’ll find out when I try to run the test and get a null reference pointer exception. Then I’ll move to a Null Object. If the default values returned from the Null Object don’t satisfy the test subject, the test usually signals that with an exception or failure of some kind, and I’ll move to a heavier collaborator.
I call this approach The Unbearable Lightness of Faking: start with the lightest possible collaborator, and use it until the lightness becomes unbearable and I absolutely must switch to something heavier.
When I’m talking to programmers about writing tests for their own code, one of the questions that comes up often is: Should we test classes in isolation from each other, or in collaboration with each other?
I like both kinds of tests. Here’s why.
I like tests that isolate classes. When a failure occurs, the tests tell me specifically what class failed, and what method failed. That guides me more directly to the fault—the specific code that is broken—and saves a ton of debugging.
I like tests that exercise collaborations. When a failure occurs, the tests tell me that:
one class or the other is not fulfilling its responsibilities, or
the collaborators disagree about each other’s responsibilities, or
some other class (the “electrician” class that connects the collaborators with each other) has wired the collaborators together improperly.
If the individual classes are well tested, I can focus my collaboration testing specifically on wiring and agreements. And if the individual classes are tested well, collaboration test failures tell me about disagreements and improper wiring.
When I test classes in isolation, failures guide me quickly to faults.
When I test classes in collaboration, failures tell me where the classes disagree about each other’s responsibilities.
Automated tests are software. At first glance, this seems like a non-blinding non-flash of non-insight. But I’m learning a lot about testing by applying this non-insight mindfully.
One thing I’m learning is how often I forget that automated tests are software. When I’m writing tests, I often neglect to apply all of the principles help me to write software well. What if I were to apply some of those principles mindfully?
A key principle is that we write software in order to serve some specific set of needs for some specific set of people. When I’m trying to understand what software to write, I apply this principle in the form of a few questions: Whose needs will the software serve? What needs will trigger those people to interact with the software? What roles will the software play in satisfying those needs?
Let’s apply this principle to the tests we write: Whose needs will these tests serve? What needs would trigger those people to interact with the tests? What roles will the tests play in satisfying those needs?
These days, I write software mostly for my own needs. And mostly I write the software alone. So the “whose needs” question is an easy one: When I write tests, I’m writing them mostly for me, for my own needs.
More enlightening for me—as a solo software developer writing tests solely for my own needs—are other questions. What needs trigger me to interact with the tests, either by running them or by reading test code? What roles do the tests play in satisfying those needs? Here’s a partial list of answers:
I want to know whether my software is ready to deliver.
I want test code to help me understand which parts of the system are tested and which are not.
I want to know whether there are defects in the software I’m writing.
I want tests to expose defects.
I want to know how to correct defects.
I want tests to direct me to the defective part of the software.
I want to understand the meaning of the test results.
I want each test’s code to indicate clearly how the test stimulates the software, and in what conditions.
I want test reports to describe the test stimulus, the relevant test conditions, and the software’s response.
When I’m adding a feature, I want to know when I’m done.
I want tests to tell me which of the feature’s responsibilities the software fulfills, and which it does not.
When I’m editing software, I want to know whether my edits are having unintended effects.
I want tests to detect changes in the behavior of the surrounding software.
When I’m preparing to edit software, I want to know what the existing code does, so that I don’t inadvertently break it.
I want test code to describe clearly what the existing software does.
That’s a partial list needs for a single stakeholder. I’m sure you can think of additional needs that you have when you run tests or read test code, and additional ways that you want tests to help you satisfy those needs. And if we were to consider other people who might interact with our tests, we would discover even more needs. And then there are all of the people who do not interact with the tests and yet are affected by them.
That’s a lot of stakeholders, and a lot of needs. I’m more likely to satisfy all of these people’s needs (including my own) when I’m aware of what the needs are. And I’m more likely to be aware of the needs when I ask questions like the ones I’ve used here. And I’m more likely to ask these questions I remember that tests are software.