How to Avoid Flaky Tests?

Google’s QA team defines a flaky test as a test that exhibits both a passing and a failing result with the same code. They are not only frustrating but can also be quite expensive since they increase the turnover time. So, let’s look at some common situations where a test could be flaky and provide possible fixes. We’ll often use Cypress, a popular web testing library to solve this problem.

The main idea behind this section is “No broken windows.” Before marking a ticket as ‘Done,’ get help from the team and take some time to refactor it until it meets your standards. If you don’t have certainty in your tests, your condition is not different from a team with no test coverage.

Flaky tests can significantly impact your productivity. In this article, we will discuss the circumstances of flakiness and ways to eliminate loopholes.

When you encounter a flaky test, check the following things:

Change in Application/Framework

If a recent change in the application influenced test flow or stability, then add/modify the script.

If a regression bug is found, log the bug in the tracking system (if not already reported) and add in script under the description or comment in Selenium JS. For example:

describe("SuiteNames.regressionSuite", function() {
it("Verify TC but defect is identified - [1] [BUG:JIRA-ID-0404]", function() {
expect(true).toBe(true);  });  });

If something changed in the testing framework or one of its dependencies, check the changes in the release notes or see if somebody has already reported the issue. You may also try to find a workaround or temporarily ignore the test from execution. However, the industry best practice is to specify fixed versions of frameworks and libraries instead of a range of versions and schedule time to investigate and update.

Test Failure Pattern

There can be many reasons for failing tests, such as parallel execution and deploying patches. Look for a pattern when a test fails. For example, it could fail between 3 AM and 5 AM, or on certain days in a week/month. Automated tests can be divided into different test suites (HealthCheck, Smoke, Regression, Defects, End to End) for better TCs investigation (Selenium JS):

export class SuiteNames {
    static  healthCheckSuite = 'HealthCheck Tests';
    static  smokeSuite = 'Smoke Tests';
    static  regressionSuite = 'Regression Tests';
    static  defectSuite = 'Test Suite for tickets';
    static  endToEndSuite = 'End to End Test Suite';

To identify the defects, look for the suite with the most failures. For example, if healthCheckSuite has some test failures, there is no need to check any other suites or test them (if they are error-free).

Post Failure Action

Create a hook that will execute after the test fails and gathers data about the test failure, the state of the application, screenshots, HTML dump, or test log. Report execution results and explain the primary error messages when there is a mismatch and results have status code as per fail/skipped etc.

Test Rerun

If a test passes after a rerun in the same environment, it is essential to understand the difference. Therefore, run your test suite multiple times on CI/locally.

Reasons for Flaky Tests

The reasons for flaky tests are mainly:

Distributed tests, trying to test the same part of the application.
No cleanup of the environment after each test.
No proper test data preparation.
Premature checks (For example, the website didn’t load yet; JS action was not fully executed).
Asynchronous Waits

Solving Asynchronous Waits

The majority of the flaky tests are caused by asynchronous waits, which can be solved by:

Callback Solution
Polling Solution
Preparing Test Data
Test Order Dependency
Cleaning Test Environment after Tests

In Callback Solution, the application can signal back to the test when it can start executing again:

await ExpectedConditions.elementToBeClickable(CommonPage.logOut)

Or test should not wait longer than necessary:

click_button "Submit"
wait_for_form_confirmation

The other solution we have is the Polling Solution (Retry-ability), in which a series of repeated checks evaluate whether an expectation has been satisfied or not. For example, consider the following lines of code in Cypress:

cy.get('button').then(($btn) => {
  const cls = $btn.attr('class')
  cy.wrap($btn).click().should('not.have.class', cls)  })

Then comes Concurrency, in which code can produce different valid results (For example, common assertions). This is the case where multiple users perform the same actions. These tests are flaky because of an incorrect assumption by a developer about the ordering of operations being performed by different threads. You can fix such flaky tests by adding a synchronization block or changing the test to accept a wider range of behaviors.

Then we have Test Order Dependency, in which every test should prepare the environment for its execution and clean the environment after it’s done (like a “database transaction”). Additionally, TC should be implemented as a standalone unit and as well as a test suite. For example, healthcheck & smoke can be prerequisites for regression or functional suite execution.

Last but not least comes the Race Conditions, in which several processors may be updating the same resource.

Documenting Flaky Tests

You can bring the team on the same page by creating SQL scripts and API scripts used for preparing the test data.

You can also document flaky tests by:

Adding comments in the code
Creating a separate section in Confluence
Creating JIRA tickets
Increasing the number of retries for certain tests

No Broken Windows

Whenever you look for flakiness, do not assume that it is a test issue. Instead, check the production code first before the test.

Once all flaky tests are fixed in the application or test suite, the team should execute the “No Broken Windows” strategy again to maintain the high quality of the test suite. This would reduce the chances of flaky tests in the future.

This post was published under the Quality Assurance Community of Experts. Communities of Experts are specialized groups at Modus that consolidate knowledge, document standards, reduce delivery times for clients, and open up growth opportunities for team members. Learn more about the Modus Community of Experts program here.

Posted in Quality Assurance

Tauqir Sarwar

Tauqir Sarwar is a QA Automation Engineer at Modus Create with over 13 years of experience in end-to-end software automation solutions and services. He loves learning new technologies and is proficient in tools and frameworks such as Selenium, Cucumber, Pytest, Protractor, RESTAssused, WebdriverIO, and Cypress. When he's not working, Tauqir loves to travel and spend time with his family.