Skip to content

Modus-Logo-Long-BlackCreated with Sketch.

  • Services
  • Work
  • Blog
  • Resources

    OUR RESOURCES

    Innovation Podcast

    Explore transformative innovation with industry leaders.

    Guides & Playbooks

    Implement leading digital innovation with our strategic guides.

    Practical guide to building an effective AI strategy
  • Who we are

    Our story

    Learn about our values, vision, and commitment to client success.

    Open Source

    Discover how we contribute to and benefit from the global open source ecosystem.

    Careers

    Join our dynamic team and shape the future of digital transformation.

    How we built our unique culture
  • Let's talk
  • EN
  • FR

When everything is on fire, where should you throw the first bucket of water? To fix bad performance, you have to profile. Figuring out where to start can be overwhelming, especially when your system performs OK under normal daily loads but collapses when a traffic spike hits you. This is part 1 of 3 in this series, see also part 2 and part 3. In this part, we focus on how to measure performance and defining where systems commonly face performance problems.

Introducing New Relic

Profiling a complex set of online systems has historically been very challenging. The development tools available to most developers, such as the Visual Studio Performance Profiler and Java VisualVM include profilers suitable for measuring the performance of an application running in a single process. They typically do not scale well in the multi-tier and multi-server deployments typical of modern production web applications.

New Relic offers a suite of sampling profilers, server monitoring tools, and performance instrumentation suitable both for production and staging systems. When Healthcare.gov launched, and had to be fixed in a hurry, the team in the war room turned to New Relic to get things back on track. You can use the same tool kit, with a dash of free software, to improve your application performance.

The great advantage of systems like this is that you can enable them in production, either on your entire set of production systems, or on a subset to get a valid sample, because they have very low overhead. Problems with normal daily loads will stand out in the transaction, database, and error-related APM (Application Performance Monitoring) error pages.

However, problems dealing with periodic heavier loads won’t be soluble by looking at this. Think of what happens when you try to buy tickets for a popular concert or movie, or registering your preschooler for gymnastics classes when there is limited space available. These sorts of loads might come only once a year – but they are the key times when you have to be available or you will lose trust with your users and possibly lose money!

Simulating Load – JMeter to the Rescue

Jmeter is our choice of performance testing tool. You simulate your scenario by recording a script using its built in proxy and edit the script to handle dynamic content. We have some easy to follow tutorials which can help you get started. To handle larger load tests economically you can use AWS in conjunction with jmeter-ec2 or another service.

Measure Twice, Cut Once

Part of the trick about performance measurement is that you have to devise a test that simulates a realistic load. An intense load test will often overwhelm a poorly performing system with tests played back at machine speeds. Just doing that much can help you get started, but it is not enough.

You might want to create several different load profiles, for example:

  • Steady State (soak): simulates the steady state load your system can sustain
  • Thundering Herd (spike): simulates a load surge that starts quickly and trails off
  • Ramp to Fail (stress): simulates a load that builds up gradually to an unbearable level that forces your system to fail

Each of these test types will ferret out different types of problems, and they require different strategies and test interpretations.

We’ll discuss test design more in Part 2 of this series.

The Usual Performance Suspects

Once you have a test that can simulate the load you expect to get, you can dive into the details of fixing the problems. So many things can drag you down, but we have found some areas that are frequently to blame. These areas all have distinctive features that New Relic APM reports can identify. We recommend that you look at remediating these issues in order:

  1. Slow Database Queries: fix the slowest database queries first, it may be enough to rescue your application performance. The New Relic reports to use here are the Transaction report and Database and Slow Queries report.
  2. Lack of Caching: cache frequently used information and session data, to reduce CPU and database load dramatically. Looking at the Transaction report helps here. Doing this well can help with item #3 also:
  3. Lack of Horizontal Scaling – deploy more servers to deal with the load. Most people think of doing this first but it won’t save you many times if you haven’t fixed your slow database queries and caching issues. Doing series of tests with more servers deployed can help you predict how large of a load you will be able to handle. The New Relic server reports come into play here.
  4. Synchronous Processing: find places where you can defer work in the critical path of your application (assembling an email confirmation, for example) to reduce the time people wait for requests to complete. This tends to be hard but it can sometimes yield massive improvements. The APM transaction report and detailed traces can help identify problems in this area.

We’ll discuss these specific remediation strategies and how to measure that they are working in Part 3 of this series.

Always Verify The Results

If you have full set of functional regression tests, you can run these to verify that your fixes have not broken other parts of your system. You can build manual regression tests using spreadsheets, but these are tedious to maintain and require lots of manpower to run. We often build such a suite using our Cucumber-Watir test harness. Depending on the application and technology, we might use another functional QA automation stack, such as Webdriver.io, Protractor, Nightwatch-Cucumber, or  Pytest-BDD. See our Quality Assurance blog posts for more details. That way, if you have made performance-related fixes, and you can run these functional tests at will, you will have higher assurance that the fixes have not caused other regressions in behavior.

Conclusion

Following this path to performance remediation has yielded good results for multiple customers of ours – and it can work for you too. If you have a good way to model and simulate the loads on your systems, and you can measure what happens inside the systems when they are under load, you can know where to make changes to remediate the problems. You can’t just try to overload your systems to see where they fail – test design is really important, we will cover that in part 2. In part 3, we will cover some of the most common ways to remediate problems and how to determine where to start fixing performance problems. If you follow the path we are outlining, you can avoid some naive approaches and reap the benefits of taking a quantitative approach. See part 2 for more on how to design valid load tests.

Further Reading

http://www.slideshare.net/SpringCentral/spring-one2gx-juliendubois-30225094?next_slideshow=1

Editor’s Note

This article was revised in November 2019 to add forward links to part 2, to add links to more contemporary QA automation stack blog articles, and to clean up some minor copy issues.

Posted in DevOps
Share this

Richard Bullington-McGuire

Richard Bullington-McGuire is a Director, Engineering. He is a serial entrepreneur and versatile technologist with experience in software development, system architecture, devops, agile processes, mobile computing, for-profit and non-profit start-up companies, and design. Richard holds both Professional-level AWS certifications. He has more than 25 years of consulting experience, and is a member of both IEEE and ACM.
Follow

Related Posts

  • New Relic and JMeter Performance Remediation
    New Relic and JMeter Performance Remediation - Part 2

    When everything is on fire, where should you throw the first bucket of water? To…

  • Jmeter for Performance Testing
    Jmeter for Performance Testing

    We all agree how important functional automated testing is, however not many applications go through…

Want more insights to fuel your innovation efforts?

Sign up to receive our monthly newsletter and exclusive content about digital transformation and product development.

What we do

Our services
AI and data
Product development
Design and UX
Modernization
Platform and MLOps
Developer experience
Security

Our partners
Atlassian
AWS
GitHub
Other partners

Who we are

Our story
Careers
Open source

Our work

Our case studies

Our resources

Blog
Innovation podcast
Guides & playbooks

Connect with us

Get monthly insights on AI adoption

© 2025 Modus Create, LLC

Privacy PolicySitemap
Scroll To Top
  • Services
  • Work
  • Blog
  • Resources
    • Innovation Podcast
    • Guides & Playbooks
  • Who we are
    • Our story
    • Careers
  • Let’s talk
  • EN
  • FR