March 15, 2010 at
5:16 pm —
Coding,Testing — Tags: acceptance testing, automated tests, automation, design, essence, naming, robot framework
I’ve known for a while that when I automate tests, layers emerge in the automation. Each chunk of automation code relies on lower-level chunks. In Robot Framework, for example, tests invoke “keywords” that themselves invoke lower-level keywords.
The layering per se wasn’t a surprise, because automated tests are software, and software tends to organize into layers. But lately I’ve noticed a pattern. The layers in my automated tests center around four themes:
- Test intentions
- System responsibilities
- Essential system interface
- System implementation interface
Test intentions. Test names and suite names are the top layer in my automation. If I’ve named each test and suite well, the names express my test intentions. Reading through the test names, and seeing how they’re organized into suites, will give useful information about what I tested and why.
For example, in my article on “Writing Maintainable Automated Acceptance Tests” (PDF), I was writing tests for a system’s account creation feature, and specifically for the account creation’s responsibility to validate passwords. I ended up with these test names (see Listing 7):
- Rejects passwords that omit required character types
- Rejects passwords with bad lengths
- Accepts minimum and maximum length passwords
In an excellent video followup to my article, Bob Martin organized his tests differently, using FitNesse. He grouped tests into two well-named suites, “Valid passwords” and “Invalid passwords.” Each suite includes a number of relevant example passwords, each described with a comment that expresses what makes the example interesting.
Every test tool that I’ve used offers at least one excellent way to express the intentions of each test. However expressed, those intentions become the top layer of my automated tests.
System responsibilities. A core reason for testing is to learn whether the system meets its responsibilities. As I refine my automation, refactoring it to express my intentions with precision, I end up naming specific system responsibilities directly.
In my article, I’m testing a specific pair of responsibilities: The account creation command must accept valid passwords and reject invalid ones. As I refactored the duplication out of my initial awkward tests, these responsibilities emerged clearly, expressed in the names of two new keywords: Accepts Password and Rejects Password. Listing 7 shows how my top-level tests build on these two keywords.
Essential system interface. By system interface, I mean the set of messages that the system sends and receives, whether initiated by users (e.g. commands sent to the system) or by the system (e.g. notifications sent to users).
By essential I mean independent of the technology used to implement the system. For example, the account creation feature must offer some way for a user to command the system to create an account, and it must include some way for the system to notify the user of the result of the command. This is true regardless of whether the system is implemented as a command line app, a web app, or a GUI app.
As I write and refine automated tests, I end up naming each of these essential messages somewhere in my code. In my article, Listing 2 defines two keywords. “Create Account” clearly identifies one message in the essential system interface. Though the other keyword, “Status Should Be,” is slightly less clear, it still suggests that the system emits a status in response to a command to create an account. (Perhaps there’s a better name that I haven’t thought of yet.) Listing 4 shows how the higher-level system responsibility keywords build upon these essential system interface keywords.
System implementation interface. The bottom layer (from the point of view of automating tests) is the system implementation interface. This is the interface through which our tests interact most directly with the system. Sometimes this interaction is direct, e.g. when Java code in our low-level test fixtures invoke Java methods in the system under test. Other times the interaction is indirect, through an intermediary tool, e.g. when we use Selenium to interact with a web app or FEST-Swing to interact with a Java Swing app.
In my article, I tested two different implementations of the account creation feature. The first was a command line application, which the tests invoked through the “Run” keyword, an intermediary built into Robot Framework. Listing 2 shows how the Create Account keyword builds on top of the Run keyword (though you’ll have to parse through the syntax junk to find it).
The second implementation was a web app, which the tests invoked through Robot Framework’s Selenium Library, an intermediary which itself interacts through Selenium, yet another intermediary. Listing 8 shows how the revised Create Account keyword builds on various keywords in the Selenium Library.
Translating Between Layers
Each chunk of test automation code translates an idea from one layer to the next lower layer. Listing 7 shows test ideas invoking system responsibilities. Listing 4 shows responsibilities invoking messages in the essential system interface. Listings 2 and 8 show how the essential system interface invokes two different system implementation interfaces.
Each of the acceptance test tools I use allows you to build layers like this. In FitNesse, top-level tests expressed in test tables may invoke “scenarios,” which are themselves written in FitNesse tables. And scenarios may invoke lower-level scenarios. In Cucumber, top-level “scenarios” invoke “test steps,” which may themselves invoke lower-level test steps. In Twist, “test scenarios” invoke lower-level “concepts” and “contexts.” Each tool offers ways to build higher layers on top of lower layers, which build upon yet lower layers, until we reach the layer that interacts directly with the system we’re testing.
In the examples in my article, I chose to write all of my code in Robot Framework’s keyword-based test language. I defined each keyword entirely in terms of lower-level keywords. I could have chosen otherwise. At any layer, I could have translated from the keyword-based language to a more general purpose programming language such as Java, Ruby, or Python. The other tools I use offer a similar choice.
But I, like many users, find these tools’ test languages easier for non-technical people to understand, and sufficiently flexible to allow users to write tests in a variety of ways. In general, I want as many of these layers as possible to be meaningful not just to technical people, but to anyone who has knowledge of the application domain. So I like to stay with the tool’s test language for all of these layers, switching to a general purpose programming language only at the lowest layer, and then only when the system’s implementation interface forces me to.
A Lens, Not a Straightjacket
When I write automated tests for more complex applications, there are often more layers than these. Yet these four jump out at me, perhaps because each represents a layer of meaning that I always care about. Every automated test suite involves test ideas, system responsibilities, the essential system interface, and the system’s implementation interface. Though other layers arise, I haven’t yet identified additional layers that are so universally meaningful to me.
These layers were a discovery for me. They offer an interesting way to look at my test code to see whether I’ve expressed important ideas directly and clearly. I don’t see them as a standard to apply, or a procrustean template to wedge my tests into. They are a useful lens.
Comments (6)
March 9, 2009 at
11:06 pm —
Coding,Testing — Tags: design
Because the concept of system responsibility is so foundational to how I develop and test software, I want to expand on my earlier description. Recall that I defined a system responsibility as a system’s obligation to respond to each notification of a specified kind of event under specified circumstances by producing a specified set of planned results.
A system responsibility includes three parts:
- A stimulus that triggers the system to respond to an event.
- A context in which the system is required to respond to the stimulus.
- A set of results that the system is obligated to realize in response to that stimulus in that context.

Stimulus. A stimulus is a message, sent by someone or something outside the boundary of the system, that informs the system of an event to which it is obligated to respond. The stimulus has a name, which may identify either the event that it represents or the planned response that the system must carry out. The stimulus may include additional information about the event.
Stimuli are delivered to a system through its interfaces. An interface defines a set of messages to which a system responds, and the mechanisms by which those messages are delivered. For GUI systems, the interface includes a suite of windows, forms, buttons, text fields, and other mechanisms that translate user gestures (mouse clicks, key presses) into messages. Web-based systems receive stimuli through HTTP requests and other interfaces. Smaller scale systems, such as objects inside a software application, expose Application Programming Interfaces (APIs) that define the set of methods to which internal objects and subsystems respond.
Result. A result is an effect that the system realizes in response to a specified stimulus in a specified context. A result may be either a message delivered to someone or something outside the boundary of the system or a change in the system’s internal state.
GUI systems deliver messages through forms, windows, screens, audio devices, and other output devices. Web-based systems deliver messages through HTTP responses and requests. An application’s internal objects and subsystems deliver messages through method calls and method return values.
In addition to delivering messages to external entities, systems also respond to events by recording information internally, and by making changes to that internal information. The information may be stored inside the running application, in a database, in files on the computer’s file system, or other storage mechanisms. The information that a system stores in order to guide its responses to future events makes up the system state.
Context. Sometimes a system’s planned response depends not just on information delivered through the stimulus, but other information as well. The context for a given responsibility is all of the information other than that delivered in the stimulus that influences the results that the system is obligated to realize in response to an event. The context may include information about the state of the system itself–that is, information that the system previously recorded in its internal memory about prior events. The context may also include information that the system can observe across its boundary–information that the system must request from external entities in order to fulfill the responsibility.
Comments (0)
February 19, 2009 at
1:12 am —
Coding,Testing — Tags: design
I first learned about the idea of planned response systems from III, a colleague and friend of mine. I later read about the idea in depth in McMenamin and Palmer’s profound book Essential Systems Analysis.
The idea of planned response systems is fundamental to how I think about programming and testing. I’m posting my thoughts here so that I can refer to these terms and ideas in later blog posts. Until I write those posts, I encourage you to notice what happens when you think about software systems as planned response systems.
A planned response system is a system that responds in planned ways to events in its environment.
For example, a software system is a planned response system—it responds in planned ways to users’ actions.
In an object-oriented software systems, each object is a planned response system—it responds in planned ways to messages sent by other objects.
Planned response systems produce two general kinds of results: They send messages to entities outside of the system boundary, and they make changes to the essential memory of the system.
An event is a significant change in the system’s environment. A change is significant to the system if the system is obligated to respond to the change in a planned way.
Events fall into two broad categories: Changes initiated by entities in the system’s environment (e.g. users or other systems), and temporal events caused by the passage of of time.
For example, an ATM is obligated to respond in a planned way to a user’s request to withdraw cash. The user’s request is an event.
A system responsibility is a system’s obligation to respond to each notification of a specified kind of event under specified circumstances by producing a specified set of planned results.
The specification of a system responsibility consists of three parts: A specification of a kind of event, a specification of a set of circumstances, and a specification of the set of planned results that the system is obligated to produce in response to being notified of an event of that kind under those circumstances.
A system becomes obligated to respond to an event when a system designer allocates that responsibility to the system.
The essence of a planned response system is the set of responsibilities allocated to the system, independent of the choice of technology used to implement the system.
The definition a system’s essence makes no mention whatever of technology inside the system, because the system’s essential responsibilities would be the same whether it were implemented using software, magical fairies, a horde of trained monkeys, or my brothers Glenn and Gregg wielding pencils and stacks of index cards.
One way to identify the essence of a system is to indulge in The Fantasy of Perfect Technology. Imagine a system implemented using perfect technology. Then ask yourself some questions about the quality attributes of the system.
How fast would it respond? If it were made of perfect technology, of course it would respond instantly, with zero delay. How many users could use it at once? An infinite number of users. How much information could it store? An infinite amount. How often would it break? It would never break. How long does it take to start up? None, because it’s always on and always available. How much energy would it use? It would use no energy; heck, it might even generate energy for free.
The one glaring flaw of perfect technology is that it does not exist. Real-world technology is imperfect. That’s what makes this exercise a fantasy. But it’s a useful fantasy, because it helps us to separate the system’s essential responsibilities from the temporary constraints of current technology.
Note that we apply the Fantasy of Perfect Technology only inside the boundary of the system. Even in our fantasy, the world outside of the system is made of real, imperfect stuff, with which the system will have to interact.
Now apply the fantasy to your own system. What responsibilities would your system have even if you could implement it using perfect technology? That set of responsibilities is your system’s essence.
The essential memory of a system is the set of data that the system must remember in order to fulfill its obligations—that is, in order to respond as planned to future events.
For example, an ATM must remember users’ account balances in order to determine whether to satisfy users’ requests to withdraw money.
Comments (1)
April 29, 2004 at
12:40 am —
Testing — Tags: design
A few days ago I was poking around the web for ideas about how to test software, and I saw Scott Ambler’s article about “Full Life Cycle Object-Oriented Testing (FLOOT).” The article includes a list of common testing techniques. As I looked over the list, I noticed that there is a small set of key dimensions that distinguish one testing technique from another. For example, unit testing and system testing differ in the kind of component they test. Stress testing and usability testing differ in the quality attribute that they test for. Unit testing and acceptance testing differ in the nature of the decisions that are made based on the test results.
I love looking for patterns like that, so I spent an hour analyzing Scott’s to identify the dimensions. Here are thirteen dimensions I found, and a few examples that show how different testing techniques vary along each.
Unit Under Test. What type of component being tested?
- In Class Testing or Unit Testing, the unit under test is a class.
- In Method Testing, the unit under test is a method of a class.
- In System Testing, the unit under test is the system.
Test Case Scope. What is the scope of the interaction tested by each test case?
- In Use-Case Scenario Testing, the scope of the interaction tested by each test case is a user goal.
- In Unit Testing, the scope of each test case is a method invocation.
- In Integration Testing, the scope is a transaction.
Unit Coverage. What subset of the unit under test is exercised by the test suite?
- In Coverage Testing, the subset being exercised by the test suite is code statements.
- In Path Testing, the coverage is logic paths.
- In Regression Testing, the coverage is code changes.
- In Boundary-value Testing, the coverage is limits.
Behavioral Scope. What subset of the unit-under-test’s behavior is being tested?
- Installation Testing tests the system’s installation procedure.
- Functional Testing tests the system’s business functionality.
- Integration Testing tests interactions among subsystems.
Unit Relationships. What are the relationships among the units whose interactions are being tested?
- In Inheritance-regression Testing, the relationship between units is inheritance.
- In Integration Testing, the relationship is collaboration or peers.
Quality Attribute. What type of quality attribute is being tested?
- In Stress Testing or Volume Testing, the quality attribute being tested is throughput or latency or capacity.
- In Usability Testing, the quality attribute being tested is usability.
Stakeholder. Whose interests are the focus of the testing?
- Acceptance Testing focuses on the interests of users.
- Operations Testing focuses on the interests of operators.
- Support Testing focuses on the interests of support staff.
Liveness. How closely does the test environment mimic the operational environment. Or perhaps this dimension is better characterized as Safety: To what extent are the testers using the system to do the real work for which the system was intended?
- In a Pilot, the test is the actual operational environment, perhaps limited in scope (e.g. a small subset of users, or for a limited time).
- In Beta Testing, the environment is a fully operation environment, but perhaps used only for non-critical functions.
- In Acceptance Testing, the environment is a non-operational similar to the operational environment.
- Unit Testing is done in the development environment.
Visibility into Unit Under Test. To what extent does the tester exploit knowledge about the internals of the unit under test?
- In Black-box Testing, the tester exploits no knowledge knowledge of internals of the unit under test.
- In White-box Testing, the tester exploits full knowledge of internals.
- In Grey-box Testing, the tester exploits some knowledge of internals.
Tester. What is the relationship of the tester to the software under test?
- For Acceptance Testing or User Testing, the tester is a user of the software.
- For Unit Testing or Developer Testing, the tester is a developer of the software.
Processor. What type of “processor” will “executes” the “software” during the tests?
- In most kinds of testing, a computer executes the software.
- In Code Inspections and Design Reviews, developers “execute” the software.
- In Prototype Walkthroughs, user “execute” the “software.”
Pre-Test Confidence. How confident are we about the software before we begin the testing?
- Before Alpha Testing, our confidence in the software is lower (compared with Beta Testing).
- Before Beta Testing, our confidence in the software is higher (compared with Alpha Testing).
Decision Scope. What kinds of decisions will we make based on the outcome of the test?
- For Acceptance Testing, the key decision is shell to release the product.
- For Integration Testing, the decision may be whether to begin system testing.
- For Unit Testing, the decision is whether the current coding task is complete.
This list is based on only an hour’s work, and on my analysis of only a single list of testing techniques (Scott’s), so I don’t claim that it is anywhere near complete or correct. It might be useful, though, for people who want to expand their repertoire of testing techniques, or to locate a technique that fits a given purpose or context.
I wonder what would happen if we created a thirteen-dimensional matrix. What parts would of the matrix would be crowded with testing techniques? What parts would be empty?
Thirteen dimensions is more than I can handle. So what would happen if we took two or three dimensions at a time and explored all of the values along those dimensions? Would that be interesting? Would it be useful? Would it help us to identify testing techniques that fit our specific situations? Might we notice holes in the matrix for which we want to invent useful techniques?
Comments (3)