Test Data Management Best Practices

Automated tests should be able to run independently to avoid runtime issues, which can arise if those tests make assumptions. Poor test data management is the leading cause of test automation failure. So, in this article, we’ll look at some of the best practices and common mistakes in test data management.

Test Environments

All tests need to run against different environments. Therefore, it’s a good idea to provide a unique instance configuration as they remain the same without any restriction of the environment.

But what if another test changes the state of the patient before the test runs? The test will fail because the data expected to be available was changed by another test.

seleniumAddress: defaultConfigSetup.params.selenium.hub,
seleniumAddress: defaultConfigSetup.seleniumAddress.browserStack,

Test Data Management

The tests should be responsible for managing their own data, which makes it critical for them to set that up in order to run. While writing the tests, the goal should be to make them independent, i.e., ensuring that tests can run without relying on other tests. The continuous delivery system can also aid us with these objectives.

Let’s look at how the end-to-end test automation solution can manage test data.

NEW RESEARCH: LEARN HOW DECISION-MAKERS ARE PRIORITIZING DIGITAL INITIATIVES IN 2024.

Creating Test Data During Test Execution

When creating test data during test execution, always ensure that the test data files are not outdated. Furthermore, setting up test data takes additional time (especially through the user interface). It also requires additional code that increases the maintenance burden of automated tests. If an error occurs during the test data setup phase, the actual test result will be unpredictable and cannot be trusted. This is especially true if you don’t abort the test before executing the actual test steps.

Query Before Execution

The query of test data before test execution is also important since there is no guarantee that the database has the same data required for a test case (especially with edge cases). At times, getting the correct query that gives 100% surety of the right test data requires specific system knowledge. Though, for some teams, this is less than an ideal approach. Even if the query is exactly right, it’s not guaranteed that the results are 100% in accordance with the test case.

beforeEach(async () => {
        await browser.restart();
        await CommonPageHelper.navigateToUrl(browser.baseUrl);
       await CommonPageHelper.prepareTestData(data.users.sql);
      await CommonPageHelper.createUserForCase(userData.register.user); });

Running Tests in parallel

If the tests are independent, they are not relying on any of the other tests. So, you can run tests in parallel without any difficulty. It is advisable to reset the test data state before or after a test run.

Strategies for Test Data Management

Let’s discuss a few strategies for managing test data effectively:

1. Sanity Checks

Before rigorous testing, we first perform exploratory testing to verify the rationality of the system. As exploratory testing is performed by testers, it isn’t documented and is unscripted. Sanity checks can expedite the tests or immediately alert the teams if the environment has to go through changes or bug fixes. To provide specific resources to respective teams, it is important to understand the whole environment and collate what type of tests the teams perform.

2. Restore Data Source Approach

It is feasible to reset the data source that the application is using before test execution for managing test data. Yet, in some systems, refreshing the data source can take hours or even days. Furthermore, it may also be costly. Refreshing data, the source works with some test suites, applications, and environments. The key to implementing it is understanding the team’s constraints and aligning them with goals for the tests. However, copying complex data is complicated, and manual data refreshes can be slow and error-prone. So, what if a team is tasked with updating a table, a database, or even synchronizing a test environment database with a production one? This will result in data sets becoming unaligned.

You can save a lot of effort in creating test data by maintaining a central repository required for various kinds of testing. Moreover, having different repository versions can help with regression testing to identify what data changes can cause the code to break.

3. Analysis of Data

For instance, in a workbook management product, management controller application, middleware applications, and database applications all perform functions in correlation with each other. For this, the required test data could be scattered and require a thorough analysis for effective management.

4. Externalize Data

Externalizing test data is an important factor for automation test data. Since there is a high chance of changing data for future maintenance, you should be able to change and test data easily. Therefore, it’s essential to have a strategy for object recognition that enables the test designer to externalize object identifiers.

5. Localization

Its purpose is merely to test appropriate linguistic and cultural aspects of some locale. Content and UI are significantly affected by localization. As this mainly includes a change in the user interface, it’s better to prepare the test data at the start when making the test automation strategy. The testers repeat the same functions and verify various aspects, including cultural appropriateness of UI, linguistic errors, typographical errors through different variants of test data.

6. Acknowledge All Test Environments

To define the test data management strategy, it is crucial to consider test environments for different projects. For this, considering and analyzing the test environments and then managing them according to different scenarios is an important factor. So, to manage the automation data for multiple environments, you can split the data for environments into different categories, such as – common data (data is common for all environments) and specific data (data changes with the environment). After this, creating and managing files would be easier and much more specific.

7. Test Data Clean-up and Reset Purpose

You may need to create or alter the test data based on the testing requirement in the current release cycle (where a release cycle can span over a long period). You may require this data at a later point. Hence, it’s useful to formulate a clear process of deeming when cleaning up test data.

Before or after a test run, reset the test data state, especially when dealing with end-to-end test data automation. Restoring and cleaning up a database guarantees that there is no change in the state based on test data. This ultimately improves the repeatability and predictability of tests. Hence, resetting the test data states for the specific data required is an effective strategy to manage test data automation.

8. Identify and Protect Data

During proper testing of applications, there may be a large amount of sensitive data. For example, a cloud-based test environment is a popular choice where guaranteeing user privacy is the primary concern. So, the mechanism to shield sensitive data must be identified and governed by the volume (amount) of test data used.

9. Virtualize Data

The optimization of resources is a dire need for any environment. Therefore, it’s good practice to virtualize environments, such as the cloud, for testing purposes. This would help the testers, as the environment would be independent and contain all diverse resources necessary for testing. Another important factor for virtualizing data is that you can destroy the instances once the test concludes, leading to a significant cost reduction.

Functional Testing

Let’s take an example where you need to perform functional testing or black-box testing. Here, the objective is to verify that the code functionally meets all the specified requirements – the preparation of test cases should typically have coverage of positive path data, negative path data, null data, erroneous data, and boundary condition data.

While testing data sources, you will encounter the following challenges:

Test data coverage is often incomplete
Testing teams do not have access to the data sources
Delay in giving production data access to the testers
Large volumes of data may be needed in a short time
Data dependencies/combinations to test some of the business scenarios
Most of the data is created or prepared during the test execution
Multiple applications and data versions
Continuous release cycles across several applications

Common Mistakes in Test Data Management

When it comes to test data management, success is defined by using the correct data and avoiding mistakes that allow defects to slip by unnoticed. Even minor issues can have serious consequences. Keeping this in mind, let’s discuss a few common mistakes:

1. Having no plan

This approach is described as such because it has no generation strategy. The data that is used by the tests has nothing to do with test automation code. Moreover, the approach does nothing to clean up data after each test case runs. It does not work in most environments or for most applications under test. However, it does serve as a foundation for many patterns.

For example, if the data in the system changes because another user (or test case) changed it, then the test would fail. If we want our test case to change data in the system and verify that it changed, then re-running the test will fail. Similarly, if we want to run the same test case in parallel, we would experience a race condition.

2. Driving Testing Through User Interface

Driving the tests through the user interface could work well initially, but eventually, it requires more time. If you search for ‘Test Automation’ in your browser, most of the examples would be driving the entire system through the user interface. They would ultimately end up testing certain features or flows multiple times over.

3. Self-Centered Approach

What if we don’t refresh the database often and instead create unique data for each test case execution?

This is a selfish approach because it only cares about the concerns of the individual test and nothing else. It does not consider what may happen when it would create 500 million users or what data growth does to query times across the application. The best fact about this approach is that it gets all of your tests to run without race conditions causing false positives in test reports. It finds issues within the SUT that arise from varying the data used for inputs. You can achieve this through different test files (Jsons, properties, features).

Conclusion

Handling test data correctly is crucial for having a reliable test solution. In the case of a multi-phase product development cycle, it’s useful to re-evaluate the selected test data. Moreover, the test environment should be of prime importance in each case. Dedicated test environment maintenance teams can help establish frameworks for effective maintenance of test environments to ensure smoother release cycles. Effective testing is a cost-effective solution for organizations to make their products more reliable.

This post was published under the Quality Assurance Community of Experts. Communities of Experts are specialized groups at Modus that consolidate knowledge, document standards, reduce delivery times for clients, and open up growth opportunities for team members. Learn more about the Modus Community of Experts here.

Posted in Quality Assurance

Tauqir Sarwar

Tauqir Sarwar is a QA Automation Engineer at Modus Create with over 13 years of experience in end-to-end software automation solutions and services. He loves learning new technologies and is proficient in tools and frameworks such as Selenium, Cucumber, Pytest, Protractor, RESTAssused, WebdriverIO, and Cypress. When he's not working, Tauqir loves to travel and spend time with his family.