Performance Testing and its Benefits

A friend and colleague of mine published a very interesting article about performance testing. I found really useful, specially for professionals that don’t know or understand what performance test is.

-–

In software engineering, performance test is the test to determine how fast some aspect of a system performs under a particular workload.

It can also serve to validate and verify other quality attributes of the system, such as scalability, reliability and resource usage.

Performance testing is a subset of Performance engineering, an emerging computer science practice which strives to build performance into the design and architecture of a system, prior to the onset of actual coding effort.

Performance testing can serve different purposes:

It can demonstrate that the system meets performance criteria;
It can compare two systems to find which performs better;
It can measure what parts of the system or workload cause the system to perform badly.

We can define Performance Testing as: ”The testing conducted to evaluate the compliance of a system or software component with specified performance requirements, such as response times, transaction rates and resource utilization.”

The later a performance defect is detected, the higher the cost of remediation. This is true in the case of functional testing, but even more so with performance testing, due to the end-to-end nature of its scope.

Purpose

Performance engineering is the process by which software is tested and tuned with the intent of realizing the required performance. This process aims to optimize the most important application performance trait, the customer experience.

There are additional efforts in Process Engineering process:

Performance testing: The most important task to determine system behavior under specific system workloads. Performance Testing is also similar to a black box testing
Profiling: A profiler is a performance analysis tool that measures the behavior of a program as it executes particularly the frequency and duration of function calls. The usual goal of profiling is to determine which sections of a program to optimize — similar to a grey box testing.
Performance Unit Test: Very similar to Unit tests, but measuring time spent by code routines, procedures and methods – similar to a white box testing.
Performance Tuning: Adjust or modifying system parameters to handle a higher load with better response times**.**

Benefits

Additional benefits of Performance Engineering could be claimed. The financial cost of poor performance and service interruptions varies widely with the type of application affected. To assess it however, the following points should be considered:

Potential loss of customers, clients or partners;
Reduced employee productivity;
Damage to your brand or corporate image;
Capital expense of purchasing unnecessary hardware;
Unsold products and services when systems are down;
Additional cost to fix performance problems.

Additional benefits of Performance testing:

Eliminate system failure requiring scrapping and writing off the system development effort due to performance objective failure;
Eliminate late system deployment due to performance issues;
Eliminate avoidable system rework due to performance issues;
Eliminate avoidable system tuning efforts;
Reduce additional operational overhead for handling system issues due to performance problems.

In order to drive systems to increase the customer experience reducing system poor performance, Performance Engineering process must consider three main performance goals:

Speed – Does the application respond quickly enough for the intended users?
Scalability – Will the application handle the expected user load and beyond?
Stability – Is the application stable under expected and unexpected user loads?

Performance Test Types

Specific tools are utilized to conducted several test types in order to guarantee that performance goals are reaching our expected results. Test descriptions may vary from different sources, but basically we have these types of performance tests:

Load Tests

Load Tests are end to end performance tests under anticipated production load. The primary objective of this test is to determine the response times for various time critical transactions and business processes and that they are within documented expectations (or Service Level Agreements - SLAs). The test also measures the capability of the application to function correctly under load, by measuring transaction pass/fail/error rates. This test is one of the most fundamental load and performance tests and needs to be well understood. This is a major test, requiring substantial input from the business, so that anticipated activity can be accurately simulated in a test situation. Load testing must be executed on “today’s” production size database, and optionally with a “projected” database.

Fail-over Tests

Fail-over Tests verify of redundancy mechanisms while under load. For example, such testing determines what will happen if multiple web servers are being used under peak anticipate load, and one of them dies. Does the load balancer react quickly enough? Can the other web servers handle the sudden dumping of extra load? This sort of testing allows technicians to address problems in advance, in the comfort of a testing situation, rather than in the heat of a production outage.

Reliability Tests (a.k.a. Endurance Test, Soak Test, Long Running Test)

Reliability test is running a system at high levels of load for prolonged periods of time. That test would normally execute several times more transactions in an entire day (or night) than would be expected in a busy day, to identify any performance problems that appear after a large number of transactions have been executed.

Also, it is possible that a system may ‘stop’ working after a certain number of transactions have been processed due to memory leaks or other defects. That scenario will provide an opportunity to identify such defects, whereas load tests and stress tests may not find such problems due to their relatively short duration.

Some typical problems identified during soak tests are listed below:

Serious memory leaks that would eventually result in a memory crisis;
Failure to close connections between tiers of a multi-tiered system under some circumstances which could stall some or all modules of the system;
Failure to close database cursors under some conditions which would eventually result in the entire system stalling;
Gradual degradation of response time of some functions as internal data-structures become less efficient during a long test.

Stress Tests

Stress Tests determine the load under which a system fails, and how it fails. This is in contrast to Load Testing, which attempts to simulate anticipated load. It is important to know in advance if a ‘stress’ situation will result in a catastrophic system failure, or if everything just “goes really slow”. There are various varieties of Stress Tests, including spike, stepped and gradual ramp-up tests. Catastrophic failures require restarting various infrastructures and contribute to downtime, a stress-full environment for support staff and managers, as well as possible financial losses. This test is one of the most fundamental load and performance tests and needs to be well understood.

Stress Tests identifies the predicted point of failure where servers fail to handle loads, as well we can identify the point when response time degradation became noticeable, a predicted point of failure and also we can identify unusable resources reserve.

Targeted Infrastructure Test

Targeted Infrastructure Tests are isolated tests of each layer and or component in an end to end application configuration. It includes communications infrastructure, Load Balancers, Web Servers, Application Servers, Crypto cards, Citrix Servers, Database… allowing for identification of any performance issues that would fundamentally limit the overall ability of a system to deliver at a given performance level.

Baseline Tests

Performance Tests are tests that determine end to end timing (benchmarking) of various time critical business processes and transactions, while the system is under low load, but with a production sized database. This sets ‘best possible’ performance expectation under a given configuration of infrastructure. It also highlights very early in the testing process if changes need to be made before load testing should be undertaken. For example, a customer search may take 15 seconds in a full sized database if indexes had not been applied correctly, or if an SQL ‘hint’ was incorporated in a statement that had been optimized with a much smaller database. Such performance testing would highlight such a slow customer search transaction, which could be remediated prior to a full end to end load test.

Network Sensitivity Tests

Network sensitivity tests are tests that set up scenarios of varying types of network activity (traffic, error rates…), and then measure the impact of that traffic on various applications that are bandwidth dependant. Very ‘chatty’ applications can appear to be more prone to response time degradation under certain conditions than other applications that actually use more bandwidth. For example, some applications may degrade to unacceptable levels of response time when a certain pattern of network traffic uses 50% of available bandwidth, while other applications are virtually un-changed in response time even with 85% of available bandwidth consumed elsewhere.

This is a particularly important test for deployment of a time critical application over a WAN.

Rendezvous Test

During a test we can instruct multiple users to perform tasks simultaneously by using rendezvous points. A rendezvous point creates intense user load on the server and measure server performance under load.

Suppose you want to measure how a Web-based banking system performs when ten users simultaneously check account information. To emulate the required user load on the server, we instruct all the users to check account information at exactly the same time.

Volume Tests

Volume Tests are often most appropriate to Messaging, Batch and Conversion processing type situations. In a Volume Test, there is often no such measure as Response time. Instead, there is usually a concept of Throughput.

Test Results

Several statists are collected during test executions; these numbers are analyzed in order to generate a report, guarantee main performance goals (Speed, Scalability and Stability) and to identify possible performance problems. Examples of statistics collected under test are:

Transactions Response Times (Averages, Standard Deviations): The average time taken to perform transactions during the test. This statistic helps to determine whether the performance of the server is within acceptable minimum and maximum transaction performance time ranges defined for your system.
Hits Per Second: The number of hits made on the Web server by users. This statistics helps to evaluate the amount of load users generate, in terms of the number of hits.
Throughput: The amount of throughput (in bytes) on the Web server during the test. Throughput represents the amount of data that the users received from the server at any given second. This statistic helps to evaluate the amount of load users generate, in terms of server throughput.
Transaction per second: The number of completed transactions (both successful and unsuccessful) performed during a test. This statistic helps to determine the actual transaction load on system.
CPU: The CPU % of utilization spent during a test.
Memory: Memory utilization spent during a test.
Disk: Disk utilization spent during a test.

by Leonardo Castilhos

This text was adapted from:

http://en.wikipedia.org/wiki/Performance_engineering

http://en.wikipedia.org/wiki/Software_performance_testing

http://en.wikipedia.org/wiki/Software_profiling