Using Test Data for Increased Software Quality

There are many factors that influence the quality of the application that you and your team deliver. One of these factors is having a good testing process. This includes manual and automated tests and code quality checks at necessary levels. In order to have a successful testing session or have a fast and reliable test automation infrastructure your project needs to handle test data flawlessly. In addition, other things such as test environment setup, CI/ CD pipeline integration, logging and reporting, etc. all must be setup and running correctly. Test data is a major influencer of test reliability and should be the main concern of the people involved in writing the tests. If the test suites are not reliable, time is wasted with re-running them and investigating false-positive or false-negative results. This may decrease the confidence in the quality of the application among the team, stakeholders and customers.

What is Test Data?

Test data, in the computer software industry context, represents a set of inputs that are fed to the program (software under test) in order to trigger certain behavior or output data. This is an adapted version of the definition from Wikipedia.

Knowing this we can think of various ways to classify test data. Here are some of the most common classes of test data:

Confirmatory test data is used to produce or generate output data that is compared to an expected set or benchmark data set.
Behavioral test data is used to produce certain behavior in the application in order to determine that it behaves as expected when a user does something or that the application behaves correctly under unexpected or extreme conditions.
Focused test data is generated in a systematic or calculated way so that an expected test data set can be generated.
Randomized test data is generated, as the name suggests, in a random way and is usually used for performance testing or testing of applications based on neural networks.
Dynamic test data represents the data set that is specifically used to produce the output. An example of this would be the products that you add to cart in the context of an ecommerce application.
Static test data represents the data set that is used only to support the test and is not altered nor does it influence the test in any way. In the context of the same ecommerce application this could be the store data. It could be random data added as the store name, address and so on, but it just needs to be there in order to have the test run and imitate a user navigating the application.
Recorded test data is the data set that is recorded in a system in any format. Whether its recorded in a CSV file, Excel file or directly in the database, the point is that it keeps being reused.
Disposable test data represents a data set that is used only once and then discarded. At the next test execution this data set is generated once more.

These are just a few classes of test data. Depending on the application itself and the industry in which its used there could be other classes of test data.

Handling test data in the test environments

So, how do we handle test data in test environments? Depending on how the application projects are organised and implemented and what testing tools are or will be used there are a few options available. The following are the ones I’ve seen used most frequently.

The ‘gold copy’ approach is a common practice. A baseline of the application databases needs to be chosen. Let’s say that the Prod databases are copied into a separate project/ container/ etc. (depending on how is it implemented.) Optionally, the data can be anonymized if legislation requires it. This or these DB copies will be used for running tests and as the application grows the data in this gold copy can be expanded. After a test execution or a number of test executions are run, the application DBs can be refreshed using the gold copy and have a fresh set of test data. This set of data can, of course, contain data for manual tests as well. Before a release, the databases in the test environment are refreshed, then the automated tests are triggered (functional and non-functional) and the remaining scenarios can be executed manually.

Scripting the test data set is also a common approach. Data is added to DB scripts and can be executed by a step in the deploy job or in a ‘refresh’ job. This is especially useful if the application databases are decomposed in scripts and bundled in a DB project. With this approach, it’s also very easy to add the scripts to a code review or a pull request and have the test data be reviewed just like application data or code. An added advantage is that each individual script could be executed as part of test execution and after a test suite has been run, thus refreshing data immediately for another session. A disadvantage of this method is that updating a large number of scripts for a database with many tables can become tedious.

Importing data is one of the most used ways to setup testing. The data set is implemented in JSON, CSV, or Excel files and imported via special functionality of the application, directly using an API or via a separate internal tool.
From the start two aspects are obvious:

a large data set is hard to maintain
if there is a bug in the import application, the data can be corrupted or not imported at all. On the other hand, this is probably the simplest way to handle test data, so for many applications this is a viable option.

A combination of the above mentioned approaches can also be successful for a project.

How do we handle test data at test runtime?

There are 3 stages where we can manipulate test data when running a test or a suite of tests: before test run, during test run, and after the test execution has finished.

Before Test Run
Before running the test/s the required data is set up via scripts, imports or other methods. Usually this is time sensitive data and needs to be freshly created. Static test data can be regenerated at this point as well using an automated job or even when deploying the DB project if one exists.
During Test Run
During the test execution stage, data is refreshed or created using input from test steps dependent on each other. Furthermore, part of the created test data can be deleted during test execution. Unique IDs might be stored into variables and could be lost after the execution has finished and may negatively impact the next tests. An example here would be a temporary shopping list. If the user created a temporary shopping list as part of the test, it can immediately be deleted by ID or name which might be lost after the execution ends. It’s not recommended to add a lot of global variables to store IDs for test data cleanup.
After Test Run
After the test execution has finished, data can be refreshed or deleted. An advantage of cleaning up after test execution is that it can be done in a large batch, meaning that only one connection to the DB is open and only one API call or only one import job is triggered, depending on the data manipulation method chosen. One important aspect to note is that cleaning up data after every test might increase the time to execute the test suite, so this must be thought out carefully.

Incorporating this information into our development process

Here is one possibility of how the process could look. This will vary according to each project.

The best part about the process described above is that it can apply to different types of testing, such as API testing, UI functional testing, performance testing, accessibility testing, and others.

Conclusion

When choosing any option or a combination of options a couple of aspects need to be considered, including:

reliability of the data manipulation method
availability (what’s already existing and can be reused)
technical skills of the team
size of the project
scalability
security aspects

Now that we’ve explained some test data types and manipulation methods, try them out and see how they help your application. There are other less common viable options out there as well. Have you had success with other methods? We would love to hear! Share with us in the comments.

Posted in Quality Assurance

Sergiu Bacanu

Sergiu Bacanu is a test automation engineer at Modus Create. His goal is to help teams incorporate testing in their development process so that it contributes to business success. He enjoys discovering and trying out new frameworks, methodologies and techniques that aid testing activities. When not working you can find him playing sports, hiking, reading or playing video games.