Conversations with Dale

about leading software development

Four Layers in Automated Tests

| Comments

I’ve known for a while that when I automate tests, layers emerge in the automation. Each chunk of automation code relies on lower-level chunks. In Robot Framework, for example, tests invoke “keywords” that themselves invoke lower-level keywords.

The layering per se wasn’t a surprise, because automated tests are software, and software tends to organize into layers. But lately I’ve noticed a pattern. The layers in my automated tests center around four themes:

  • Test intentions
  • System responsibilities
  • Essential system interface
  • System implementation interface

Test intentions. Test names and suite names are the top layer in my automation. If I’ve named each test and suite well, the names express my test intentions. Reading through the test names, and seeing how they’re organized into suites, will give useful information about what I tested and why.

For example, in my article on “Writing Maintainable Automated Acceptance Tests” (PDF), I was writing tests for a system’s account creation feature, and specifically for the account creation’s responsibility to validate passwords. I ended up with these test names (see Listing 7):

  • Rejects passwords that omit required character types
  • Rejects passwords with bad lengths
  • Accepts minimum and maximum length passwords

In an excellent video followup to my article, Bob Martin organized his tests differently, using FitNesse. He grouped tests into two well-named suites, “Valid passwords” and “Invalid passwords.” Each suite includes a number of relevant example passwords, each described with a comment that expresses what makes the example interesting.

Every test tool that I’ve used offers at least one excellent way to express the intentions of each test. However expressed, those intentions become the top layer of my automated tests.

System responsibilities. A core reason for testing is to learn whether the system meets its responsibilities. As I refine my automation, refactoring it to express my intentions with precision, I end up naming specific system responsibilities directly.

In my article, I’m testing a specific pair of responsibilities: The account creation command must accept valid passwords and reject invalid ones. As I refactored the duplication out of my initial awkward tests, these responsibilities emerged clearly, expressed in the names of two new keywords: Accepts Password and Rejects Password. Listing 7 shows how my top-level tests build on these two keywords.

Essential system interface. By system interface, I mean the set of messages that the system sends and receives, whether initiated by users (e.g. commands sent to the system) or by the system (e.g. notifications sent to users).

By essential I mean independent of the technology used to implement the system. For example, the account creation feature must offer some way for a user to command the system to create an account, and it must include some way for the system to notify the user of the result of the command. This is true regardless of whether the system is implemented as a command line app, a web app, or a GUI app.

As I write and refine automated tests, I end up naming each of these essential messages somewhere in my code. In my article, Listing 2 defines two keywords. “Create Account” clearly identifies one message in the essential system interface. Though the other keyword, “Status Should Be,” is slightly less clear, it still suggests that the system emits a status in response to a command to create an account. (Perhaps there’s a better name that I haven’t thought of yet.) Listing 4 shows how the higher-level system responsibility keywords build upon these essential system interface keywords.

System implementation interface. The bottom layer (from the point of view of automating tests) is the system implementation interface. This is the interface through which our tests interact most directly with the system. Sometimes this interaction is direct, e.g. when Java code in our low-level test fixtures invoke Java methods in the system under test. Other times the interaction is indirect, through an intermediary tool, e.g. when we use Selenium to interact with a web app or FEST-Swing to interact with a Java Swing app.

In my article, I tested two different implementations of the account creation feature. The first was a command line application, which the tests invoked through the “Run” keyword, an intermediary built into Robot Framework. Listing 2 shows how the Create Account keyword builds on top of the Run keyword (though you’ll have to parse through the syntax junk to find it).

The second implementation was a web app, which the tests invoked through Robot Framework’s Selenium Library, an intermediary which itself interacts through Selenium, yet another intermediary. Listing 8 shows how the revised Create Account keyword builds on various keywords in the Selenium Library.

Translating Between Layers

Each chunk of test automation code translates an idea from one layer to the next lower layer. Listing 7 shows test ideas invoking system responsibilities. Listing 4 shows responsibilities invoking messages in the essential system interface. Listings 2 and 8 show how the essential system interface invokes two different system implementation interfaces.

Each of the acceptance test tools I use allows you to build layers like this. In FitNesse, top-level tests expressed in test tables may invoke “scenarios,” which are themselves written in FitNesse tables. And scenarios may invoke lower-level scenarios. In Cucumber, top-level “scenarios” invoke “test steps,” which may themselves invoke lower-level test steps. In Twist, “test scenarios” invoke lower-level “concepts” and “contexts.” Each tool offers ways to build higher layers on top of lower layers, which build upon yet lower layers, until we reach the layer that interacts directly with the system we’re testing.

In the examples in my article, I chose to write all of my code in Robot Framework’s keyword-based test language. I defined each keyword entirely in terms of lower-level keywords. I could have chosen otherwise. At any layer, I could have translated from the keyword-based language to a more general purpose programming language such as Java, Ruby, or Python. The other tools I use offer a similar choice.

But I, like many users, find these tools’ test languages easier for non-technical people to understand, and sufficiently flexible to allow users to write tests in a variety of ways. In general, I want as many of these layers as possible to be meaningful not just to technical people, but to anyone who has knowledge of the application domain. So I like to stay with the tool’s test language for all of these layers, switching to a general purpose programming language only at the lowest layer, and then only when the system’s implementation interface forces me to.

A Lens, Not a Straightjacket

When I write automated tests for more complex applications, there are often more layers than these. Yet these four jump out at me, perhaps because each represents a layer of meaning that I always care about. Every automated test suite involves test ideas, system responsibilities, the essential system interface, and the system’s implementation interface. Though other layers arise, I haven’t yet identified additional layers that are so universally meaningful to me.

These layers were a discovery for me. They offer an interesting way to look at my test code to see whether I’ve expressed important ideas directly and clearly. I don’t see them as a standard to apply, or a procrustean template to wedge my tests into. They are a useful lens.

Tradeoffs

| Comments

In a thoughful comment on my blog post about writing maintainable automated acceptance tests, Chris Falter suggested a different way to name the variables in my test cases. He mentions that our two naming styles present a tradeoff, and that set me on a long trail of thought.

I’m fascinated by tradeoffs, and I often drive myself nuts making them – I go back and forth and back and forth and… And at some point, I’ll identify the qualities that I’m trying to trade off (expressiveness, test speed, number of places I’d have to change if the code or the requirements change) and move on. Until the next time I visit that file. Then I’ll go back and forth and back and forth… I am very good at revisiting decisions, and not so good at sticking with a decision I made in the past – even minutes in the past.

Chris’s suggestion points out that there are two pieces of information we’re trying to encode into the name: The idea that passwords have a minimum and maximum valid length, and the specific minimum and maximum (6 and 16 characters). I went with one of those pieces of information, Chris went with the other.

As Chris points out, each style leaves readers to infer something important. With Chris’s style, readers must infer what’s special about any given length. With my style, readers must infer what specific lengths form the boundaries. Neither style expresses both pieces of information explicitly – e.g. that the maximum legal length is 16 characters. There’s a tradeoff here: Which piece of information to express? And by making that tradeoff differently, each style not only expresses one piece of information, but also emphasizes it. My style emphasizes the idea of minimum and maximum lengths; Chris’s style emphasizes the specific lengths themselves.

Also, Chris points out (and I agree) that my style requires readers to count string lengths, a tedious, error-prone chore.

Given all of that: Which style do you prefer? More importantly: What do you prefer about it?

Sometimes I can easily see how to trade off possibilities. Other times I can’t see a clear winner. For those times, I recommend experimenting. Try each possibility. Then pay attention to what happens.

In this case, there’s another criterion I can apply: With Chris’s tests, the length of each password appears three times: Once implicitly in the password itself, once in the declaration of the variable, and once in the test that references each variable. Expressing that specific datum three times is potentially troublesome: If we increase the maximum length of a password to 20, we’d have to change six places (three places for a max length password, three places for a password that’s too long). With my style, we’d have to change only two places: the passwords themselves. The variable names would remain the same.

Though I’m not entirely sure which style emphasizes the more important bit of information, the criterion of “how many places I’d have to change” leaves me preferring the style I used in the article. Chris might still reasonably prefer his style, if the extra expressiveness he perceives is valuable enough to outweigh the extra cost of change.

So far, I still prefer my original variable names to Chris’s. And yet his suggestion, his thoughtful explanation of why he prefers it, and especially the contrast it provides with my original tests, make me wonder: Now that we know what we’re trading off, can we find a way to eliminate the tradeoff altogether? Is there another style that allows us to express all of the information we want to express, and without increasing the cost of change?

Uncle Bob, in his video, offers a third style: Instead of conveying information through variable names, express it through comments. His comments express something similar to my variable names: maximumness and minimumness. It would be easy enough to add the information that Chris’s variable names express and that mine lack: “16 characters is just short enough” and “6 characters is just long enough”. I’ve unfortunately trained myself to feel queasy whenever I start to type a comment into code. I’m going to have to get over that. Writing a comment is not necessarily evil; it’s just a tradeoff.

Contrasting my tests to Uncle Bob’s, I notice yet another tradeoff: How to organize tests into suites? I organized my tests around specific validity criteria: One set of tests for character content requirements, another for length requirements. Uncle Bob organized the same tests differently: One set of tests for valid passwords, another for invalid passwords. And each way of organizing requires us to name our groupings, which offers an opportunity to subtly highlight one piece of information or the other. My organization emphasizes that are two classes of validity criteria, content and length. Uncle Bob’s emphasizes that passwords may be valid or invalid.

Which emphasis do you prefer? More importantly: What do you prefer about it?

A few final points about tradeoffs. If you want to get better at making tradeoffs such as these, step one is to notice what tradeoffs you’re making. And a great way to do that is to pair with someone. I wrote the tests on my own, and in the article I mentioned the tradeoffs I was aware of making. But I made other tradeoffs implicitly, without noticing I was making them. It was only when Chris and Bob offered alternatives that I noticed I was making those tradeoff at all.

Thanks Chris and Bob for inviting me to explore the tradeoffs I make, and how I make them!

Writing Maintainable Automated Acceptance Tests

| Comments

I’ve posted ”Writing Maintainable Automated Acceptance Tests” on my articles page.

The article demonstrates how to make automated acceptance tests more maintainable by:

  • Hiding incidental details
  • Eliminating duplication
  • Naming essential ideas

Though the examples in the article use a very nice testing framework called Robot Framework, the ideas work just as well with other other popular open-source testing frameworks, such as FitNesse and Cucumber.

You will be able to follow the article even if you don’t know Robot Framework. But don’t be surprised if it inspires you to give Robot Framework a try.

Too Much to Ask

| Comments

Is it too much to ask that people show up to meetings on time? Is it too much to ask that software developers care about craft? Is it too much to ask for an honest politician? Is it too much to ask that drivers drive as if they were sharing the road? Is it too much to ask that immigrants learn how to speak English?

Here’s how to tell whether you’re asking too much: How do you feel when you don’t get it?

Enthusiasm as a Human Resource

| Comments

Every so often someone rants about the term “human resources.” A person is not a resource, they say. A person is a person. True enough.

I suppose some managers see people as resources, replaceable, fungible, interchangeable cogs in the corporate machine. My sense is that few managers who use the term “human resources” use it in that way. And still the term rankles.

I don’t often use the term myself. And when I hear it, I interpret it to mean resources that originate in people, in much the same way that natural resources refers not to nature per se as a resource, but to resources that originate in nature. The human resource is not the person, but something that the person offers to the organization.

Several years ago, Jerry Weinberg offered a nice idea for what that resource is. I don’t remember Jerry’s exact words, so I’ll paraphrase from my perhaps faulty memory: The resource is the person’s agreement with the organization. The resource is the person’s agreement to contribute to organizational ends.

I liked that idea, and I’ve kept it in mind ever since, whenever the “human resource” complaints crop up.

Today I found another idea. On Twitter, Brian Marick quoted a New York Times article by Jon Mooallem: “[Sandpoint, Idaho council member John] Reuter seemed to argue that enthusiasm is an actual asset, a resource our society is already suffering a scarcity of.”

And my synapses made a connection: Enthusiasm is the human resource. Especially in knowledge work, the primary human resource is people’s enthusiasm for the the organization’s purposes and for the work that serves those purposes.

What I like about this notion is that it is more dynamic than “human resource” or even “the person’s agreement.” A person’s enthusiasm can wax and wane. We can nurture it, squander it, squash it. We as leaders have a great deal of influence over how much enthusiasm exists in our organizations, and how much is available for the organization. And it’s not only renewable, but potentially non-diminishable: We can use people’s enthusiasm in ways that leave them even more enthusiastic than they were before. And enthusiasm is catching.

By the way, though I’m convinced that distinguishing between management and leadership is an utter waste of time, I’m gonna do it anyway: A manager deploys people’s enthusiasm toward organizational purposes. A leader (in an organization) nurtures, cultivates, grows, invites, coaxes, inspires people’s enthusiasm for organizational purposes.

(Yes, I know that enthusiasm is only one thing people offer their organizations, so the human resource doesn’t tell the whole story. Of course skills and knowledge matter, too, and a host of other things. But today I’m enthusiastic about enthusiasm, so I’m taking a little blogistic license.)

The Anatomy of a Responsibility

| Comments

Because the concept of system responsibility is so foundational to how I develop and test software, I want to expand on my earlier description. Recall that I defined a system responsibility as a system’s obligation to respond to each notification of a specified kind of event under specified circumstances by producing a specified set of planned results.

A system responsibility includes three parts:

  • A stimulus that triggers the system to respond to an event.
  • A context in which the system is required to respond to the stimulus.
  • A set of results that the system is obligated to realize in response to that stimulus in that context.

Stimulus. A stimulus is a message, sent by someone or something outside the boundary of the system, that informs the system of an event to which it is obligated to respond. The stimulus has a name, which may identify either the event that it represents or the planned response that the system must carry out. The stimulus may include additional information about the event.

Stimuli are delivered to a system through its interfaces. An interface defines a set of messages to which a system responds, and the mechanisms by which those messages are delivered. For GUI systems, the interface includes a suite of windows, forms, buttons, text fields, and other mechanisms that translate user gestures (mouse clicks, key presses) into messages. Web-based systems receive stimuli through HTTP requests and other interfaces. Smaller scale systems, such as objects inside a software application, expose Application Programming Interfaces (APIs) that define the set of methods to which internal objects and subsystems respond.

Result. A result is an effect that the system realizes in response to a specified stimulus in a specified context. A result may be either a message delivered to someone or something outside the boundary of the system or a change in the system’s internal state.

GUI systems deliver messages through forms, windows, screens, audio devices, and other output devices. Web-based systems deliver messages through HTTP responses and requests. An application’s internal objects and subsystems deliver messages through method calls and method return values.

In addition to delivering messages to external entities, systems also respond to events by recording information internally, and by making changes to that internal information. The information may be stored inside the running application, in a database, in files on the computer’s file system, or other storage mechanisms. The information that a system stores in order to guide its responses to future events makes up the system state.

Context. Sometimes a system’s planned response depends not just on information delivered through the stimulus, but other information as well. The context for a given responsibility is all of the information other than that delivered in the stimulus that influences the results that the system is obligated to realize in response to an event. The context may include information about the state of the system itself–that is, information that the system previously recorded in its internal memory about prior events. The context may also include information that the system can observe across its boundary–information that the system must request from external entities in order to fulfill the responsibility.

Planned Response Systems

| Comments

I first learned about the idea of planned response systems from III, a colleague and friend of mine. I later read about the idea in depth in McMenamin and Palmer’s profound book Essential Systems Analysis.

The idea of planned response systems is fundamental to how I think about programming and testing. I’m posting my thoughts here so that I can refer to these terms and ideas in later blog posts. Until I write those posts, I encourage you to notice what happens when you think about software systems as planned response systems.

A planned response system is a system that responds in planned ways to events in its environment.

For example, a software system is a planned response system—it responds in planned ways to users’ actions.

In an object-oriented software systems, each object is a planned response system—it responds in planned ways to messages sent by other objects.

Planned response systems produce two general kinds of results: They send messages to entities outside of the system boundary, and they make changes to the essential memory of the system.

An event is a significant change in the system’s environment. A change is significant to the system if the system is obligated to respond to the change in a planned way.

Events fall into two broad categories: Changes initiated by entities in the system’s environment (e.g. users or other systems), and temporal events caused by the passage of of time.

For example, an ATM is obligated to respond in a planned way to a user’s request to withdraw cash. The user’s request is an event.

A system responsibility is a system’s obligation to respond to each notification of a specified kind of event under specified circumstances by producing a specified set of planned results.

The specification of a system responsibility consists of three parts: A specification of a kind of event, a specification of a set of circumstances, and a specification of the set of planned results that the system is obligated to produce in response to being notified of an event of that kind under those circumstances.

A system becomes obligated to respond to an event when a system designer allocates that responsibility to the system.

The essence of a planned response system is the set of responsibilities allocated to the system, independent of the choice of technology used to implement the system.

The definition a system’s essence makes no mention whatever of technology inside the system, because the system’s essential responsibilities would be the same whether it were implemented using software, magical fairies, a horde of trained monkeys, or my brothers Glenn and Gregg wielding pencils and stacks of index cards.

One way to identify the essence of a system is to indulge in The Fantasy of Perfect Technology. Imagine a system implemented using perfect technology. Then ask yourself some questions about the quality attributes of the system.

How fast would it respond? If it were made of perfect technology, of course it would respond instantly, with zero delay. How many users could use it at once? An infinite number of users. How much information could it store? An infinite amount. How often would it break? It would never break. How long does it take to start up? None, because it’s always on and always available. How much energy would it use? It would use no energy; heck, it might even generate energy for free.

The one glaring flaw of perfect technology is that it does not exist. Real-world technology is imperfect. That’s what makes this exercise a fantasy. But it’s a useful fantasy, because it helps us to separate the system’s essential responsibilities from the temporary constraints of current technology.

Note that we apply the Fantasy of Perfect Technology only inside the boundary of the system. Even in our fantasy, the world outside of the system is made of real, imperfect stuff, with which the system will have to interact.

Now apply the fantasy to your own system. What responsibilities would your system have even if you could implement it using perfect technology? That set of responsibilities is your system’s essence.

The essential memory of a system is the set of data that the system must remember in order to fulfill its obligations—that is, in order to respond as planned to future events.

For example, an ATM must remember users’ account balances in order to determine whether to satisfy users’ requests to withdraw money.

Beware Accounting Errors

| Comments

Beware accounting errors.

A highly visible part of TDD is that developers write more tests than they used to. It is tempting to conclude from this that programming takes longer with TDD than without. After all, we are writing tests that we didn’t write before, and this is additional work on top of all the work we used to do. Right?

That’s a perfectly reasonable conclusion. But it is based on a flawed assumption, an accounting error that leads us to overlook some important costs. The assumption is that in addition to the new work of writing tests, developers will continue to do all of the same work they were doing before.

This assumption turns out to be unwarranted in practice. For example, before TDD, developers typically spend a great deal of time debugging. Why are they debugging? Because they have identified a failure, but they don’t know what specific piece of code is broken that produces the failure. They step through the code in the debugger in order to trace the failure to the fault. Only when they identify the broken code can they fix it.

TDD reduces the amount of debugging in two ways. First, developers write fewer defects. Second, when a developer introduces a defect, the code immediately fails one or more tests. These tests point directly toward the broken code. If the tests tell me what specific code is broken, I don’t need to run the debugger to find the fault.

Also, if developers write fewer defects, they now spend less time fixing defects.

If we add up all of the effects of TDD, including the cost of writing tests and the cost savings we gain by writing tests, we find that the first few features cost slightly more than without TDD, but that the quality is noticeably higher. And we find that later features take less time than without TDD, because TDD keeps code changeable.

An additional wrinkle leads to another accounting error: not everyone sees debugging as a cost. In my pre-TDD days, I usually enjoyed debugging. I loved having a meaty puzzle to solve. It was often the most fun and interesting and engaging part of programming. And in one job, my managers (falling prey to yet another accounting error) rewarded fixing bugs far more visibly and vocally than preventing them.

If you’re a manager, learn to attend not just to the obvious costs and benefits of a given set of programming practices, but also the costs and benefits that require a little effort and insight to detect.

If you’re a developer, I can tell you that I don’t miss debugging. The puzzle of how to choose the next test that will drive my code toward completion is as fun and interesting and challenging as debugging ever was. I hope it’s the same for you.

Testing Is an Information Service

| Comments

Testing is an information service. The point of testing is to inform stakeholders about the system. This is not a new sentiment, nor does it originate with me. But I’ve found that many testers have not considered their role from this perspective.

I teach classes about how to test software. Early in each class I describe testing as an information service. Even in classes filled with experienced testers, there are always a few people for whom this is a new idea.

In one class, just as I finished saying that testing is an information service, a man in the back of the room said, “Oh, no!”

“You disagree?” I asked.

“No, no, I agree,” he said. “It’s just that I’ve never thought of it that way before.” He paused and frowned. “And I think I’ve been doing it all wrong.”

I thought it was unlikely that he’d been doing it all wrong, so I asked, “How have you been doing it?”

“I just try to break stuff. When I can break it, it’s like I win. And if I can’t break it, I feel like I’m failing.”

“Trying to break stuff,” I said, “is an important part of testing.” I mentioned James A. Whittaker’s excellent book How to Break Software, which teaches testers how to find the kinds of defects that arise from common programming errors.

“I know,” he said, “but that’s all I’ve been doing. And when I find a nice, nasty bug, I run over to the developers and rub it in their faces.”

“Oh, no”, I said.

He laughed and nodded. “Now you understand.”

“How does that work out?” I asked. (I know what you’re thinking, but you’ve got it backwards. Doctor Phil channels me.)

“They hate it. And hate to see me coming. They keep telling me to bring them some good news once in a while.”

“But if your job is only to break stuff…”

“Then I never tell them what’s working. But that’s information, too, and that’s what I just realized. And that’s what they’ve been asking for.”

I’ve had numerous similar conversations with testers who had found themselves mired in unproductive relationships with developers. Shifting your focus from breaking stuff to informing stakeholders (including developers) can help with that.

I’ll say more later about testing as an information service. In the meantime, I’d love to hear your questions and comments about it.

Interviewing Characters: Follow the Energy

| Comments

On November 13, 2007 I ran out of plot for the NaNoWriMo novel I was writing. I had no idea what to write next. That’s not uncommon for NaNo novelists, but I hadda do something to jiggle myself loose. In NaNoWriMo, word count is everything, and I couldn’t afford to fall behind.

So I tried something I hadn’t tried before: I interviewed my characters.

Well, that turned out to be more interesting than I’d anticipated. And it boosted my word count to boot. And on top of that, it offered some plot ideas.

I didn’t use any pre-planned questionnaire. There are zillions of character questionnaires on the web, and none of them ever seemed to get at the heart of the character.

Instead, I did what I do in many real-life interviews: Follow the energy. The idea is to:

  1. Ask a question that invites the character to tell me something new
  2. Listen for emotional intensity in the answer. Sometimes the emotion is subtle, and other times it’s big and obvious.
  3. Ask my next question based on that emotion.

Rather than describing this process in detail, I’ll let you read the interviews as I conducted them, unedited. I offer these interviews not necessarily as exemplary, but merely as examples. The thing to notice is how I followed the characters’ energy.

Some background: The novel involves a time loop. Every 29 hours, the characters (and everyone else in the story world) loop back in time. The story follows two main plots.

In the first plot, Dan Roberge murders his wife Faith and her lover Zorem. Then time loops and he murders them again. And again. Police detectives Ray Andollo and Patty Yonce investigate.

The interviews:

In the second plot, Amy Anderson saves her son from drowning in a pond on the family farm. Then time loops and her son drowns. Then time loops again. After the first incident (before the first time loop), Amy’s husband Frank becomes engraged when he discovers that Amy had been drinking while their sons played at the pond.

The interviews:

  • Amy Anderson. This was my favorite interview, because it so significantly affected my understanding of the character.
  • Frank Anderson