Thursday, January 5, 2012

Calculating Concurrency from Performance Test Results


So you are on a performance test engagement and your boss asks how many people concurrently executing certain transactions like buying a book or doing a search. He wants is a measure of active concurrency - how many people are doing certain transaction. This should not be confused with Passive concurrency like how many people are logged in. Before we go any further lets clarify that in this example a transaction is a request to the test system and a response back it does not include any think time. Now before you start getting out the virtual terminal server and incrementing counters at the start of the transaction and decrementing counters at the end. There is an easier way.

You can work this all out from your performance test results, without the need for code. Using a mathematical formula (it’s very simple so don’t panic) called Little’s Law. Little’s Law was first used to analyze the performance of telephone exchanges in 1969 by John Little.
Little’s law allows us to relate the mean number of items in the system in our case concurrent users with the mean time in the system (response time) as follows:

Number of Items in the system = Arrival Rate x Response Time

There is one rule to remember before you use little law you must make sure the system is balanced. That is the arrival rate into the system is the same at the exit rate.
I will begin with a none computer example the “Black Horse Pub” has a mean arrival rate of 5 customers per hour that stay for on average half an hour. Using little’s law we can calculate the mean number of customers in the pub as Arrival Rate x Response Time = 5 x 0.5 = 2.5 customers.
To apply little law to a performance test we must first make sure that we are taking measurements from when the system under test is balanced. Remember a balanced system the rate of work entering the system matches the rate of work leaving the system. This for a typical load testing tool is after the ramp up period and the number of virtual users remains constant and response times have stabilized and the transaction per second graph is level. To capture this period of time in LoadRunner for example you would need to select the time period in the Summary report filter or under the Tools -> Options.
So record the average response time for the transaction of interest and the number of times per second the transaction is executed.

So from the example above the response time is 43.613 seconds. The arrival rate is the number of transactions executed divided by the duration. The duration for this example was a 10 minute period as can be confirm by the LoadRunner summary below.


This gives you an arrival rate of 2.005 calculated by taking the count 1203 divided by the duration 600.

So the concurrent number of users waiting for a search to return is 87.44
There you go from your performance test results you can easily calculate the concurrency for a particular transaction.

Performance Test Monitoring


Performance monitoring is the process of collecting and analyzing the server data to compare the server statistics against the expected values. It helps the performance tester to have the health check of the server during the test. By monitoring the servers during the test, one can identify the server behavior for load condition and take steps to change the server behavior by adopting software or hardware performance tuning activities.
Each performance counter helps in identifying a specific value about the server performance. For example, % CPU Utilization is a performance counter which helps in identifying the utilization level of the CPU.
In a nutshell, server monitoring should provide information on four parameters of any system: Latency, Throughput, Utilization and Efficiency, which helps in answering the following questions.

      • Is your server available?
      • How busy is your CPU?
      • Is there enough Primary Memory (RAM)?
      • Is the disk fast enough?
      • Is there any other hardware issues?
      • Is the hardware issue result of software malfunctioning?
Always start with few counters and once you notice a specific problem, start adding few more counters related to the symptom. Start monitoring the performance of the major resources like CPU, Memory, Disk or Network. This section provides you the details about key counters from each of above mentioned 4 areas which are very important for a Performance Tester to know.


Processor Bottlenecks
The bottlenecks related to processor (CPU) are comparatively easy to identify. The important performance counters that helps in identifying the processor bottleneck includes

% Processor Utilization (Processor_Total: % Processor Time) – This counter helps in knowing how busy the system is. It indicates the processor activity. It is the average percentage of elapsed time that the processor spends to execute a productive (non-idle) thread. A consistent level of more than 80% utilization (in case of single CPU machines) indicates that there is not enough CPU capacity. It is worth further investigation using other processor counters. 

% User time (Processor_Total: % User Time) – This refers to the processor’s time spent in handling the application related processes. A high percentage indicates that the application is consuming high CPU. The process level counters needs to be monitored to understand which user process consumes more CPU. 

% Privileged time (Processor_Total: % Privilege Time) – This refers to the processor’s time spent in handling the kernel mode processes. A high value indicates that the processor is too busy in handling other operating system related activities. It needs immediate attention from the system administrator to check the system configuration or service.

% I/O Wait (%wio - in case of UNIX platforms) – This refers to the percentage wait for completion of I/O activity. It is a good indication to confirm whether the threads are waiting for the I/O completion.

Processor Queue Length (System: Processor Queue Length) – This counter helps in identifying how many threads are waiting in queue for execution. A consistent queue length of more than 2 indicates bottleneck and it is worth investigation. Generally if the queue length is more than the number of CPUs available in the system, then it might reduce the system performance. A high value of % usr time coupled with high processor queue length indicates the processor bottleneck.

Other counters of interest:
Other counters like Processor: Interrupts per second, System: Context Switches per second can be used in case of any specific issues. Interrupts per second refers to the number of interrupts that the hardware devices sends to the processor. A consistent value of above 1000 in Interrupts per second indicates hardware failure or driver configuration issues. Context Switches refers to the switching of the processor from a lower priority thread to a high priority thread. A consistent value of above 15000 per second per processor indicates the presence of too many threads of same priority and possibility of having blocked threads.

Windows operating system comes with a performance monitoring tool called Perfmon. This monitoring tool can be used to collect the server statistics during the performance test run. Normally, most of the performance testing tools have its own monitors to monitor the system resources of the system under test. In this case, it becomes easy to compare the system load related metrics with the system resource utilization metrics to arrive at a conclusion. Infact, any performance testing tool that monitors windows machine internally talks to perfmon to collect the system resource utilization details. 
There are lots of licensed tools available in the market which is used for post production monitoring. These tools are used to capture the web server traffic details and provide online traffic details. A very popular tool of this category is WebTrends. Many organizations uses this tool to get to the traffic trends of an application running in production environment. There are other tools like HP OpenView tools which run in production servers and monitor the server resource utilization levels. It provides alarming mechanism to indicate the heavy usage and provides easy bottleneck isolation capabilities. But due to the cost involved with these kinds of tools, most of small organizations don’t opt for them. But post production monitoring data would be of great use for designing realistic performance tests.
· Allows you to analyze and isolate the performance problem.
· Understand the resource utilization of the server and make best use of them.
· Plan for Capacity Planning activity based on the resource utilization level.
        · Provides the server performance details offline (by creating a alert of sending a mail/message when resource utilization reaches the threshold value).

Tips to define the Performance Test Strategy

Best practices in defining performance test strategy
 
Most of the application would need Load tests, Stress tests and Stability(Endurance) tests to be planned before moving it to production. For data voluminous applications, there might be more emphasis on Volume testing and some applications like Auction site might emphasis on conducting spike tests with sudden spikes. For all other applications, its common that at least 3 cycles of load tests followed by stress test and endurance test run (at least for 12 hours) needs to be planned. (This is on a general case, but based on the application context, definitely there will be changes done to this approach).

For any application, irrespective of time availability , its always recommended to start with Baseline test. Baseline test is nothing but one user test which is run to identify the correctness of the test scripts and also it helps is checking whether the application meets the SLAs (Service Level Agreements) for 1 user load. This values can be used a benchmark for newer versions to compare ths performance improvement.
Next comes the Benchmark test for at least 15-20% of the target load. It helps in identifying the correctness of the test scripts and tests the readiness of the system before running target load tests.
Its a very good practice to always start with Baseline & Benchmark tests on any application before conducting the scheduled load tests. I would say, in fact its Scott barber's way of course.
When it comes to the scheduled tests - Load Tests, always plan to run at least 3 rounds of load test. Irrespective of doing a slow ramp , its advised to have 3 individual load tests for 50%, 75% and 100% target load. (The load level should be defined based on the system performance rather than just 50%, 75% & 100% target load levels). 

Test Scenario - Have a slow ramp up followed by a stable period for at least an hour and ramp down. During this stable period, the target user load needs to perform various operations on the system with realistic think times. All the metrics measured should correspond only to the stable period and not during ramp up/ramp down period. Don't conclude any transaction response time just based on 1 or 2 iterations. The server should be monitored at least for a minimum of 5 iterations (at the same load level), before concluding the response time metrics, because there could be some reason for high /less response time at any single point of time. Thats the reason watch the server performance for at least 5 iterations at the same load level(during stable load period) and use the 90th percentile response time to report the response time metrics.

Test Metrics - Look for 90th percentile response time and standard deviation, of-course along with hits/sec and other server resource monitors. More the deviation(more than 1), more burst is the graph and hence its recommended to rerun the test. 

Load Tests should be always followed by Stress Tests - Based on the Load test results, slowly increase the server load step by step and find out the server break point. For this test, realistic think time settings and cache settings are not required as the objective of this test to know the server break point and how it fails.

Endurance (Stability) Tests needs to be run at least for 10-12 hours in order to identify the memory bottlenecks. This need not be run for peak load, but it can be run for average load levels.

Directions for fellow Performance Testers

Before certifying the application performance,

are you satisfied with your own performance? are you sure that you are conducting effective performance test? Have you ever thought how to ensure the your performance test strategy and are you sure that your performance test report is bug free?

The following tips would help you to get more maturity and to do your job better

Yes..You need to be comfortable in the terminologies used in the performance engineering field and need to learn one or two popular and effective performance test tools and develop bottleneck analysis skills. That's the basic skills required for being a performance tester. Then you are matured enough to look back at your performance and look for how effective you are in your activities.

That's where, you need to start looking at Analytical modeling and simulation modeling approaches to move to your next level of expertise.

Foremost thing is that, understand the operational laws and start applying those laws in your day to day test activities.

Secondly start learning about Workload Modeling concepts and Statistical distribution fitting and apply those learning's to define performance testing goals/SLAs.

Thirdly start learning about Demand planning and Performance prediction techniques. Learn Mean value analysis and Erlang's method.

Fourth step is start learning about Capacity Planning.

This is the second level of maturity you need to gain. The next advanced third level is moving towards simulation modeling and advanced capacity planning.

Hope this helps to set your career path and plan to work towards it.

How do you know if your Load Test has a bottleneck

The bottleneck in a system may not be obvious. (Life would be easier but less fun if there where always easy to find). This is because there are two types “hard” and “soft”. Hard bottlenecks are the ones where a resource such as a CPU is working flat out which limits the ability of the system to process more transaction. While a soft bottleneck is some internal limit such at number of threads or connections that once all used limit the ability to process more transaction. Therefore, how do you find know if you have a bottleneck. If you are looking at the results from a single load test you may not know you will need to run multiple load tests at different numbers of virtual users and then see if you number of transactions per second increase with each increase in virtual users. The results can be seen in the two graphs below. The first shows how the throughput (transaction per seconds) increases and levels off when saturated and the second shows the response time. You will probably have heard the express below the knee of the curve and this is an the point that is to the left of the bend in the response time graph.

Throughput Graph
Throughput Graph
Response Time Graph

The graphs above where actually generated using a spreadsheet model for the performance of a closed loop model. This is like LoadRunner and other testing tools where the are a fixed number of users that use the system then wait and return to the system. The reality is that the performance graphs may look different from the expected norm. An example is shown below from a LoadRunner test the first graph shows how the number of VUser where increased during the test and the second graph shows the increase in response times. In this case the jump in response time is dramatic. However, in some cases the increase in response time will be less dramatic as the system will start to error at high loads which will distort the response time figures.
Example LoadRunner VUser Graph

Example LoadRunner Graph Showing Increasing Response Times

Having discovered there is a bottleneck in the system then you have to start looking for it.

Importance of Little's Law

Little's law is quite simple and intuitively appealing.

The law states that the average number of customers in a system (over some time interval), N, is equal to their average arrival rate, X, multiplied by their average time in the system, R.

N = X . R (or) for easy remembrance use L = A . W

This law is very important to check whether the load testing tool is not a bottleneck.
For Example, in a shop , if there are always 2 customers available at the counter queue , wherein the customers are entering the shop at the rate of 4 per second , then time taken for the customers to leave from the shop can be calculated as

N = X. R

R = 2/4 = 0.5 seconds


A Simple example of how to use this law to know how many virtual users licenses are required:

Web system that has peak user load per hour = 2000 users
Expected Response time per transaction = 4 seconds
The peak page hits/sec = 80 hits/sec

For carrying out Performance tests for the above web system, we need to calculate how many number of Virtual user licenses we need to purchase.

N = X . R
N = 80 . 4 = 320

Therefore 320 virtual user licenses are enough to carry out the Load Test.