Scaling Performance Tests in the Cloud with JMeter

I have very heterogeneous performance test use cases. From simple performance regression tests that are executed from a Jenkins node to eventual large-ish stress tests that run with over 100K requests per second and > 100 load generators. With higher loads, many problems arise, like feeding data to load generators, retrieving results, real-time view, analyzing huge data sets and so on.

JMeter is a great tool, but it has its own limitations. In order to scale, I had to work around a few of it’s limitations and created a test framework to help me execute tests at scale, on Amazon’s EC2.

Having a central data feeder was a problem. Using JMeter’s master node is impossible. A single shared data source might become a bottleneck, so having a way of distributing it was important. I thought about using a feeder model similar to Twitter’s Iago or a clustered, load balanced resource, but settled for something simpler. Since most tests only use a limited data set and loop around it, I just decided to bzip files and upload them to each load generator before the test starts. This way I avoided the problem of making an extra request to get data during execution and requesting the same data multiple times because of the loop. One problem with this approach is that I don’t have centralized control over the data set, since each load generator is using the same input. I mitigate that by managing the data locally on each load generator, with a hash function or introducing random values. I also considered distributing different files to different load generators based on a hash function, but so far, there was no need.

Retrieving results was tricky too. Again, using JMeter’s master node was impossible because of the amount of traffic. I tried having a pooler fetching raw ( only timestamp, label, success and response time ) results in real-time, but that affected the results. Downloading all results at the end of the test worked by checking the status of the test ( running or not ) every minute and downloading after completion, but I settled with having a custom sampler in a tearDown thread group, compressing and uploading results to Amazon’s S3. This could definitely be a plugin too. It works reasonably well, but I loose the real-time view and have to manually add a file writer and sampler to tests.

With real-time view, I started with the same approach as jmeter-ec2, pooling aggregated data (avg response time, rps, etc) from each load generator and printing that, but it proved useless with a large number of load generators. For now, on Java samplers, I’m using Netflix’s servo to publish metrics in real time (averaged over a minute) to our monitoring system. I’m considering writing a listener plugin that could use the same approach to publish data from any sampler. Form the monitoring system I can then analyze and plot real-time data with minor delays. Another option I’m considering is using the same approach, but using StatsD and Graphite.

Analyzing huge result sets was the biggest challenge I believe. For that, I’ve developed a web-based analysis tool. It doesn’t store raw results, but mostly time-based aggregated statistical data from both JMeter and monitoring systems, allowing some data manipulation for analysis and automatic comparison of result sets. Aggregating and analyzing tests with over 1B samples is a problem, even after constant tuning. Loading all data points to memory to calculate percentiles and sorting is practically impossible, just for the fact that the amount of memory I’ll need is impractical, even with small objects. For now, on large tests, I settled on aggregating data while loading results ( second/minute data points ) and accepting the statistical problems, like average of averages. Another option would be to analyze results from each load generator independently and aggregate at the end. In the future, I’m considering having results on a Hadoop cluster and using Map/Reduce the get the aggregated statistical data back.

The framework also helps me automate most of the test process, like creating a new load generator cluster on EC2, copying test artifacts to load generators, executing and monitoring the test while it’s running, collecting results and logs, triggering analysis, tearing down the cluster and cleaning up after the test completes.

Most of this was written in Java or Groovy and I hope to open-source the analysis tool in the future.