Scaling Performance Tests in the Cloud with JMeter

I have very heterogeneous performance test use cases. From simple performance regression tests that are executed from a Jenkins node to eventual large-ish stress tests that run with over 100K requests per second and > 100 load generators. With higher loads, many problems arise, like feeding data to load generators, retrieving results, real-time view, analyzing huge data sets and so on.

JMeter is a great tool, but it has its own limitations. In order to scale, I had to work around a few of it’s limitations and created a test framework to help me execute tests at scale, on Amazon’s EC2.

Having a central data feeder was a problem. Using JMeter’s master node is impossible. A single shared data source might become a bottleneck, so having a way of distributing it was important. I thought about using a feeder model similar to Twitter’s Iago or a clustered, load balanced resource, but settled for something simpler. Since most tests only use a limited data set and loop around it, I just decided to bzip files and upload them to each load generator before the test starts. This way I avoided the problem of making an extra request to get data during execution and requesting the same data multiple times because of the loop. One problem with this approach is that I don’t have centralized control over the data set, since each load generator is using the same input. I mitigate that by managing the data locally on each load generator, with a hash function or introducing random values. I also considered distributing different files to different load generators based on a hash function, but so far, there was no need.

Retrieving results was tricky too. Again, using JMeter’s master node was impossible because of the amount of traffic. I tried having a pooler fetching raw ( only timestamp, label, success and response time ) results in real-time, but that affected the results. Downloading all results at the end of the test worked by checking the status of the test ( running or not ) every minute and downloading after completion, but I settled with having a custom sampler in a tearDown thread group, compressing and uploading results to Amazon’s S3. This could definitely be a plugin too. It works reasonably well, but I loose the real-time view and have to manually add a file writer and sampler to tests.

With real-time view, I started with the same approach as jmeter-ec2, pooling aggregated data (avg response time, rps, etc) from each load generator and printing that, but it proved useless with a large number of load generators. For now, on Java samplers, I’m using Netflix’s servo to publish metrics in real time (averaged over a minute) to our monitoring system. I’m considering writing a listener plugin that could use the same approach to publish data from any sampler. Form the monitoring system I can then analyze and plot real-time data with minor delays. Another option I’m considering is using the same approach, but using StatsD and Graphite.

Analyzing huge result sets was the biggest challenge I believe. For that, I’ve developed a web-based analysis tool. It doesn’t store raw results, but mostly time-based aggregated statistical data from both JMeter and monitoring systems, allowing some data manipulation for analysis and automatic comparison of result sets. Aggregating and analyzing tests with over 1B samples is a problem, even after constant tuning. Loading all data points to memory to calculate percentiles and sorting is practically impossible, just for the fact that the amount of memory I’ll need is impractical, even with small objects. For now, on large tests, I settled on aggregating data while loading results ( second/minute data points ) and accepting the statistical problems, like average of averages. Another option would be to analyze results from each load generator independently and aggregate at the end. In the future, I’m considering having results on a Hadoop cluster and using Map/Reduce the get the aggregated statistical data back.

The framework also helps me automate most of the test process, like creating a new load generator cluster on EC2, copying test artifacts to load generators, executing and monitoring the test while it’s running, collecting results and logs, triggering analysis, tearing down the cluster and cleaning up after the test completes.

Most of this was written in Java or Groovy and I hope to open-source the analysis tool in the future.

Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+

15 thoughts on “Scaling Performance Tests in the Cloud with JMeter

  1. At Jellly, we are using a combination of ElasticSearch, Apache Mesos and REST Apis to push metrics from JMeter to our EC2 Beanstalk Tomcat farm. We have made an article explaining how we’ve designed our system:

    Your article was inspiring for us when we have created Jellly, and we would like to thank you for sharing your thoughts about the technical challenges you faced. :)

    • Thank you and great article by the way! Really interesting solution! ES + JMeter is on my to-do list for quite a while and I just didn’t have the time to get to it yet! Glad to know it’s working well for you guys! Cheers

  2. Hi Martin,
    Just to let you know the Backend Listener was commited to JMeter trunk : => To let you plug whatever backend you want => Graphite Backend

    Output example here (live results or afterwards) using Grafana and InfluxDB:

    Philippe M.

  3. Hello,
    Also regarding the distributed “chatty” protocol, which version of JMeter did you use ?
    Since Jmeter 2.9, default SampleSender has been changed to strip response making it much less verbose.
    Did you try following modes for example:
    – StrippedBatch
    – StrippedAsynch


    • It was a while ago, probably 2.7 if I’m not mistaken. I’m running on 2.10 now, but never bothered to look into it again after the framework was in place. I’ll definitely check those out next week. How was your experience running tests under these modes?

      • Rather very good experience. I never used any other mode (except ours before it existed).
        Since JMeter 2.9, the default is StrippedBatch (you can configure frequence and size of messages)
        StrippedAsync is similar but async so should be even better in terms of low to no impact on test.

      • Great, now I’m even more interested! I’ll try run a few tests with both options and publish the results once I have something. :-)

    • Thanks Philippe! I’ve been using jmeter-plugins for a while, but never noticed the Redis Data Set. I’ll check it out once I have the time! I’ve been thinking about a Graphite listener, but looks like you already had a head start and have something working. I probably have a few extra requirements, like a configurable flush interval, so I’ll probably have a create something customized. I will keep an eye on the bug and discussions, and report back once I have something. Do you have Github repo for this??

      • Hello Martin,
        When I have time I plan to contribute a full rewriting of Graphite Listener following what has been decided on dev list (mainly refactoring of existing Graphite Listener):
        – Implement a Backend Listener similar to Java Request (it will look for backend implementations)
        – Implement a Graphite Backend
        – Possibly implement a JDBC Backend

        If you have time, maybe you can work on this together and contribute to Core JMeter which has a read only github:


      • I really like this model. More flexible and easier to extend! My initial idea was to develop a plugin, mostly because I’m not sure how it would fit core, but I’m happy to contribute! I can probably free some time to have a look into this soon, but I want to run a few tests first.
        p.s.: This model will probably make my life easier when I decide to implement a Servo listener/backend too.

  4. slandelle from Gatling suggested using Redis as a feeder. I like the idea and hope to have some time to try it out soon!

Leave a Reply