The Way to Testing Mastery, Part 1: Rise

#Go   #BestPractices   #TDD   #Testing   #BDD  
“When I master testing, will I get to dodge bugs?” Jessie: “No, when you master testing, you won’t have to”

“When I master testing, will I get to dodge bugs?” Jessie: “No, when you master testing, you won’t have to”

setUp()

Much has been written on the subject of automated testing, whether it be unit tests, integration, end-to-end, or any esoteric permutation thereof. As with everything these days, there are a multitude of opinions on the subject, from those prescribing a long list of complicated rules for automated testing, to those who say unit-testing is overrated or a waste of time. Then there are those who like to sound smart and politically-correct by answering “It depends” to every question, which helps absolutely no one improve.

You will not find that in this article. This is a highly-opinionated piece on what has helped me improve my testing practices, and more importantly, as will be mentioned later, the quality of my code. Take what you will and apply it your own development. If just one item from this article helps you in the slightest with your code quality, then I have accomplished my goal.

I will be presenting code examples mostly in Go, which is one of the most popular languages used today for cloud and microservice development and my favorite language currently. In some instances, I’ll also present some examples in Java to reflect the OOP model of other common languages.

The Why

It is a well-known fact that fixing a bug after a product release is much higher than if the bug were uncovered during the development or quality assurance phase. For example, quoting from Celerity:

TheSystems Sciences Institute at IBM has reported that “the cost to fix an error found after product release wasfour to five times as much as one uncovered during design, andup to 100 times more than one identified in the maintenance phase.”

The same article mentions that“A 2003 study commissioned by the Department of Commerce’s National Institute of Standards and Technology found thatsoftware bugs cost the US economy $59.5 billion annually. The message is clear: the earlier in the development process a bug is detected and resolved, the faster and cheaper it is to fix it.

When I started my career in software development in the year 2000, automated testing, of any kind, was not yet a widespread practice. In several of the companies I worked for initially, there was an entirely separate QA team that would test a release before it was pushed to production. I remember writing what seemed to be beautiful code for new applications which worked perfectly locally, only to have my ego shattered when it was deployed to the QA environment and the QA team tried thousands of ways to break my application, AND succeeded.

In the end, though, I was happy the QA team found these issues, as it meant they would not make it to production and affect actual customers. On the other hand, when an application was in its maintenance phase, adding just a few enhancements every few weeks, the QA team spent more and more time testing the application, not just for the new features, but also all the existing functionality to make sure nothing broke as a result of a new feature. This was a very tedious process for the team and, as a result of being human, some of these manual tests were either overlooked or skipped in the interest of time, only to result in a bug in an existing feature making it into production. This would end in lots of finger-pointing between the development and QA teams, with the QA testers asking “why did you introduce this bug here?” and developers saying “why didn’t you catch it during your testing?”.

It was around the year 2008 when I first learned about unit-testing, which marked a turning point for me in my career. As I started writing tests, and eventually doing true Test-Driven Development and BDD, the quality of my code improved, the amount of defects that made it into production was just a small fraction of what it was before, and being on-call was no longer something I dreaded.

I have learned a lot about testing since then, and I’m still learning about it each day. With this article, I hope to impart my lessons learned so that we can continue to move the needle in this exciting field.

Besides the reasons above, remember this: well-designed code is not necessarily easily testable, but easily testable code is usually well-designed. If you want to improve the quality of your code (which is the REAL goal out of all of this), then make your code easily testable. The best way to guarantee this is by writing your tests first.

Testing Pyramid

Testing Pyramid (source)

The What

When you first learn about unit testing, you will inevitably learn about the Testing Pyramid as first coined by Mike Cohn. In-depth information on this pyramid can be found on the blog of the “man, myth and legend” himself, Martin Fowler, at https://martinfowler.com/bliki/TestPyramid.html and https://martinfowler.com/articles/practical-test-pyramid.html .

For simplicity’s sake, and because we live in the day-and-age of API-first development and microservices, here are the layers I think of today and their explanation (from bottom-up):

  • Unit Tests: these are tests done against a single “unit” in the application. What’s a unit, you might ask? For me, this is simply whatever fits the “S” (Single-Responsibility Principle) in S.O.L.I.D. In Object-Oriented Programming, this will be a class. In functional programming, this can simply be a function. In any case, the tests will be done directly for this unit and any dependencies for this unit, such as database, file system and so forth, will either be “mocked” (🤮) or “stubbed” (😍). I will expand on why stubbing is superior to mocking in most cases later on.
  • Integration Tests: these are tests that test the interaction between different components. Examples are: testing a Repository class against an actual database instance, a service that touches the filesystem or one that queries an external API. For these tests, you can either use an actual dependency instance (like a dedicated database for integration tests or hitting an actual third-party API) or use a dependency that mimics the behavior of the real thing. I will show examples of these later on.
  • Component Tests: to me, this is simply a testing layer higher than an integration test (where usually you only test an interaction from one component to another), but lower than a full end-to-end test. In the case of a microservice, this involves testing your entire microservice at the API level, whether your microservice is a REST, GraphQL or gRPC API.
  • End-to-end Tests: these are tests performed from the point-of-view of the end user. Usually, these tests simulate UI events, such as button clicks or JavaScript events, which in turn call APIs to fetch/process data and tests that the UI is updated accordingly. These days, I recommend writing these tests using Cypress. In case your product is actually a public API that will be consumed by external clients, then the component tests you wrote are actually your end-to-end tests and you don’t need to create component tests.

As seen in the pyramid graph, the tests in the lower layers are easier to write and faster to execute. The higher you go up in the pyramid, the more work there is in setting up an environment in order to make sure that tests can be executed in a consistent way, and as a consequence of actually hitting the database and other dependent services, the slower these tests will take to run. In this article, I’ll be focusing mostly on unit tests and lightly on integration tests. If there’s enough interest, I’ll create a separate article for component and end-to-end testing.

The How

Before we delve into the technical aspect, I need to stress one thing: code quality metricsby themselvesare useless at best, and misleading at worst. For example, when starting out in the testing journey, developers and/or management might be tempted to set out a goal of “let’s meet a code coverage goal of 80%” or something similar. This is a recipe for failure. It is possible (and easy) to achieve a high code coverage percentage without improving maintainability/code quality. Spaghetti code with high code coverage is still spaghetti code. As a simple example:

-- main.go

package main

import (
	"fmt"
	"strings"
)

func main() {
	numbers := []string{"1", "2", "3", "5", "7", "11", "13", "15", "17", "19"}
	fmt.Printf("Prime numbers: %s\n", strings.Join(numbers, ", "))
}

-- main_test.go

package main

import "testing"

func TestFullCodeCoverage(t *testing.T) {
	main()
}

This program simply prints the list of prime numbers from 1–20, and surprise, it has 100% code coverage! Yet:

  • The test file is in the same package as the code that is being tested. This is a bad practice in Go. The test file should be in a separate package in order to force testing only the public interface of the code under the test (meaning only the exported functions, structs and variables). The convention in Go is to put the test code in a package named the same as the package plus add a “_test” suffix to it.
  • The test function is not testing anything! Code coverage simply tracks the lines that were executed during a test suite. From a code coverage report alone, you cannot verify that actual assertions were made or determine the quality of the tests.
  • Related to the above, this code has a bug! 15 is not a prime number. Yet we have 100% code coverage…
  • There are several code quality problems with this code. As an example:
    a) the list of prime numbers is hard-coded
    b) the logic is inside the main function, so I cannot reuse this code outside of this package
    c) I cannot specify a range of prime numbers to return

…and so on and so forth. It is obvious that low code coverage means that more test cases must be written, but conversely, a high test code coverage does not mean that you have covered all cases in your program. As a simple example, in languages that have the ternary operator like Java, you can have logic such as the following:

String secretValue = cloud.getName().equals("AWS") ? AWS.getSecretValue(secretName) ? GCP.getSecretValue(secretName);

The above contrived example retrieves the value of a secret which can exist either in AWS or GCP. Depending on the name of the cloud, it calls the getSecretValue method of the corresponding cloud class. Guess what? If you write a test for just one cloud, such as AWS, this line will show up as covered! If your GCP implementation has a bug in it, you will not know it until you hit production. Worse yet, if the cloud name value is empty or contains another value, GCP will always be called in these cases, which is not the desired behavior. The code coverage report also can’t tell you that this line is simply badly-designed. What if we want to add another cloud in the future? Clearly, there needs to be refactoring in this case to have the class accept an interface, and then delegate the secret value-handling to a concrete implementation of the interface.

So clearly, one line of code does not equal one test case. Instead, we should concentrate on testing situations, features and code branches, not simply writing tests to cover lines.

Managers: do not simply set goals of increasing your test code coverage to some percentage. Set out a goal of improving your teams’ code quality, of which higher test code coverage is a component. Code quality cannot be measured by a tool, no matter how fancy or expensive it is. It requires review by an architect or one or more senior developers in the company. Drill these two things into your head:

Clearly, then: the most important aspect of testing is your mindset going in AND NOT any particular technique you use. We will delve into the proper mindset in the second part of this series.

Thanks for reading!

Start blogging about your favorite technologies and get more readers

Join other developers and claim your FAUN account now!

14

Authority

477

Total Hits