Four Layers in Automated Tests

March 15, 2010 at 5:16 pm — Coding,Testing — Tags: , , , , , ,

I’ve known for a while that when I automate tests, layers emerge in the automation. Each chunk of automation code relies on lower-level chunks. In Robot Framework, for example, tests invoke “keywords” that themselves invoke lower-level keywords.

The layering per se wasn’t a surprise, because automated tests are software, and software tends to organize into layers. But lately I’ve noticed a pattern. The layers in my automated tests center around four themes:

  • Test intentions
  • System responsibilities
  • Essential system interface
  • System implementation interface

Test intentions. Test names and suite names are the top layer in my automation. If I’ve named each test and suite well, the names express my test intentions. Reading through the test names, and seeing how they’re organized into suites, will give useful information about what I tested and why.

For example, in my article on “Writing Maintainable Automated Acceptance Tests” (PDF), I was writing tests for a system’s account creation feature, and specifically for the account creation’s responsibility to validate passwords. I ended up with these test names (see Listing 7):

  • Rejects passwords that omit required character types
  • Rejects passwords with bad lengths
  • Accepts minimum and maximum length passwords

In an excellent video followup to my article, Bob Martin organized his tests differently, using FitNesse. He grouped tests into two well-named suites, “Valid passwords” and “Invalid passwords.” Each suite includes a number of relevant example passwords, each described with a comment that expresses what makes the example interesting.

Every test tool that I’ve used offers at least one excellent way to express the intentions of each test. However expressed, those intentions become the top layer of my automated tests.

System responsibilities. A core reason for testing is to learn whether the system meets its responsibilities. As I refine my automation, refactoring it to express my intentions with precision, I end up naming specific system responsibilities directly.

In my article, I’m testing a specific pair of responsibilities: The account creation command must accept valid passwords and reject invalid ones. As I refactored the duplication out of my initial awkward tests, these responsibilities emerged clearly, expressed in the names of two new keywords: Accepts Password and Rejects Password. Listing 7 shows how my top-level tests build on these two keywords.

Essential system interface. By system interface, I mean the set of messages that the system sends and receives, whether initiated by users (e.g. commands sent to the system) or by the system (e.g. notifications sent to users).

By essential I mean independent of the technology used to implement the system. For example, the account creation feature must offer some way for a user to command the system to create an account, and it must include some way for the system to notify the user of the result of the command. This is true regardless of whether the system is implemented as a command line app, a web app, or a GUI app.

As I write and refine automated tests, I end up naming each of these essential messages somewhere in my code. In my article, Listing 2 defines two keywords. “Create Account” clearly identifies one message in the essential system interface. Though the other keyword, “Status Should Be,” is slightly less clear, it still suggests that the system emits a status in response to a command to create an account. (Perhaps there’s a better name that I haven’t thought of yet.) Listing 4 shows how the higher-level system responsibility keywords build upon these essential system interface keywords.

System implementation interface. The bottom layer (from the point of view of automating tests) is the system implementation interface. This is the interface through which our tests interact most directly with the system. Sometimes this interaction is direct, e.g. when Java code in our low-level test fixtures invoke Java methods in the system under test. Other times the interaction is indirect, through an intermediary tool, e.g. when we use Selenium to interact with a web app or FEST-Swing to interact with a Java Swing app.

In my article, I tested two different implementations of the account creation feature. The first was a command line application, which the tests invoked through the “Run” keyword, an intermediary built into Robot Framework. Listing 2 shows how the Create Account keyword builds on top of the Run keyword (though you’ll have to parse through the syntax junk to find it).

The second implementation was a web app, which the tests invoked through Robot Framework’s Selenium Library, an intermediary which itself interacts through Selenium, yet another intermediary. Listing 8 shows how the revised Create Account keyword builds on various keywords in the Selenium Library.

Translating Between Layers

Each chunk of test automation code translates an idea from one layer to the next lower layer. Listing 7 shows test ideas invoking system responsibilities. Listing 4 shows responsibilities invoking messages in the essential system interface. Listings 2 and 8 show how the essential system interface invokes two different system implementation interfaces.

Each of the acceptance test tools I use allows you to build layers like this. In FitNesse, top-level tests expressed in test tables may invoke “scenarios,” which are themselves written in FitNesse tables. And scenarios may invoke lower-level scenarios. In Cucumber, top-level “scenarios” invoke “test steps,” which may themselves invoke lower-level test steps. In Twist, “test scenarios” invoke lower-level “concepts” and “contexts.” Each tool offers ways to build higher layers on top of lower layers, which build upon yet lower layers, until we reach the layer that interacts directly with the system we’re testing.

In the examples in my article, I chose to write all of my code in Robot Framework’s keyword-based test language. I defined each keyword entirely in terms of lower-level keywords. I could have chosen otherwise. At any layer, I could have translated from the keyword-based language to a more general purpose programming language such as Java, Ruby, or Python. The other tools I use offer a similar choice.

But I, like many users, find these tools’ test languages easier for non-technical people to understand, and sufficiently flexible to allow users to write tests in a variety of ways. In general, I want as many of these layers as possible to be meaningful not just to technical people, but to anyone who has knowledge of the application domain. So I like to stay with the tool’s test language for all of these layers, switching to a general purpose programming language only at the lowest layer, and then only when the system’s implementation interface forces me to.

A Lens, Not a Straightjacket

When I write automated tests for more complex applications, there are often more layers than these. Yet these four jump out at me, perhaps because each represents a layer of meaning that I always care about. Every automated test suite involves test ideas, system responsibilities, the essential system interface, and the system’s implementation interface. Though other layers arise, I haven’t yet identified additional layers that are so universally meaningful to me.

These layers were a discovery for me. They offer an interesting way to look at my test code to see whether I’ve expressed important ideas directly and clearly. I don’t see them as a standard to apply, or a procrustean template to wedge my tests into. They are a useful lens.

Comments (6)

Writing Maintainable Automated Acceptance Tests

November 23, 2009 at 1:33 pm — Testing — Tags: , , , , ,

I’ve posted “Writing Maintainable Automated Acceptance Tests” on my articles page.

The article demonstrates how to make automated acceptance tests more maintainable by:

  • Hiding incidental details
  • Eliminating duplication
  • Naming essential ideas

Though the examples in the article use a very nice testing framework called Robot Framework, the ideas work just as well with other other popular open-source testing frameworks, such as FitNesse and Cucumber.

You will be able to follow the article even if you don’t know Robot Framework. But don’t be surprised if it inspires you to give Robot Framework a try.

Comments (20)

Naming Unit Tests

March 27, 2006 at 4:05 pm — Coding,Testing — Tags: ,

Last year I read Brian Button’s wonderful article “Double Duty” in Better Software magazine (the February, 2005 issue). One of the things I learned is that Brian is the world’s best namer of unit tests. I visited Brian’s web site for more of his ideas and found an article called “TDD Defeats Programmer’s Block—Film at 11.” In this article, Brian describes using the Test Driven Development process to write a “continuous integration system” (a tool that automatically (re)builds software systems when programmers change the source code). Here are some examples of his unit test names:

  • Starting Build With No Previous State Only Starts Build For Last Change
  • Previous Build Number Is Incremented After Successful Started Build
  • Last Build Failing Leaves Last Build Set To Previous Build

What makes these names so good? I analyzed a few dozen of Brian’s test names and found this pattern: stimulus and result in context. Let’s examine these names to identify the parts.

Starting Build With No Previous State Only Starts Build For Last Change:

  • Context: There is no previous state (i.e. no previous builds were done).
  • Stimulus: Start a build.
  • Result: A build was started for only the last change.

Previous Build Number Is Incremented After Successful Started Build:

  • Context: There were zero or more previous builds.
  • Stimulus: Request a build that will succeed.
  • Result: The build number is one more than before the build.

Last Build Failing Leaves Last Build Set To Previous Build:

  • Context: There were previous builds, the most recent of which is recorded in the system as the last build.
  • Stimulus: Request a build that will fail.
  • Result: The previously identified last build is still identified as the last build.

One of Brian’s tests from a different system—an “animal factory” (a concept better left unexplained)—is called Default Animal Is Cow.

  • Context: No animal type has been identified as the desired type of animal for the system to manufacture.
  • Stimulus: Request that the system manufacture an animal.
  • Result: A new cow exists.

Now that I’ve learned the pattern that makes Brian’s test names so useful, I can use it deliberately. Using the context-stimulus-result scheme increases the value of tests as documentation. The resulting names make clear what specifically is being tested and under what specific conditions. This helps the reader to understand quickly what each test does, and what is covered by each set of tests.

Another benefit is that the context-stimulus-result naming scheme encourages you to clarify your thinking about each test. Each unit test establishes some set of starting conditions, or context. Each stimulates the system. Each compares the result to a desired result. In order to name these elements you will have to think about the specifics of each and clarify them well enough that you can describe each in a few words.

If you’re having difficulty naming a test using this scheme, that may indicate a problem in your test. Perhaps the test is doing too much work, or your test suite is doing too little. For example, suppose you’re testing software to manage bank accounts, and one test is called Withdrawal Test. We can tell from this name that the test tests the withdrawal feature in some way. But we don’t know what specific aspects of withdrawals this test is testing.

Does Withdrawal Test test only that a withdrawal of less than the account balance reduces the balance by the proper amount? If so, calling this test “Withdrawal Test” may indicate that your suite of tests for the withdrawal feature is missing many important test cases. The name of the test gives readers an overly broad sense of what the test actually tests.

Does Withdrawal Test test a score of different stimuli under a dozen different conditions? If so, it’s probably doing too much work. The name of the test does not quickly tell readers what is being tested.

Whether Withdrawal Test is doing too much work or too little, we can improve the test by applying the context-stimulus-result scheme. If Withdrawal Test is doing too much, we can use the scheme to identify how to break the test into smaller, more focused tests with more descriptive names. If Withdrawal Test tests only one tiny aspect of withdrawals and leaves other aspects untested, we can use the scheme to create a better name for the test and to identify other tests to write.

Comments (1)