Performance Testing Tools

Thursday, February 16, 2012

Agent less performance monitoring using SNMP

We have many performance monitoring tools in the market from freeware to license. All those tools can be divided as Agent less and Agent based tools.

Agent based tools are very intuitive in terms of performance monitoring but costly to implement. CA Wily Introscope is the best example for Agent based monitoring. Agents/Probes play a major part in collecting the performance metrics from the systems under test, we have to manually build them but are very intuitive as they collect the metrics from the lowest level possible and it depends on how its configured. This type of monitoring is very useful as you can track each and every request from a module/function/component where the agents are placed. The negative part of it is these configured agents will be an overhead to the system, more are the agents more are the metrics configured and more is the overhead on the system.

Most of the agent less monitoring tools used SNMP(Simple Network Management Protocol) to track all the performance metrics for the applications. All SNMP compliance systems can be monitored using SNMP. To monitor a system, we have to activate the SNMP service on that system and set all performance metrics to be monitored and the frequency at which it has to monitor. Once these basic settings are done you are ready to go. HP SiteScope is an example of agent less successful monitoring tool.

In brief:-
Agent based monitoring
Pros:- more intuitive, monitoring at module/function/component level, more metrics can be configured.
Cons:- skill set to configure all agents, overhead on the system.

Agent less monitoring
Pros:- fast to implement, no skill set required, all high level metrics can be monitored.
Cons:- used more network, cannot be customized. metrics are limited

Thursday, January 5, 2012

Calculating Concurrency from Performance Test Results

So you are on a performance test engagement and your boss asks how many people concurrently executing certain transactions like buying a book or doing a search. He wants is a measure of active concurrency - how many people are doing certain transaction. This should not be confused with Passive concurrency like how many people are logged in. Before we go any further lets clarify that in this example a transaction is a request to the test system and a response back it does not include any think time. Now before you start getting out the virtual terminal server and incrementing counters at the start of the transaction and decrementing counters at the end. There is an easier way.

You can work this all out from your performance test results, without the need for code. Using a mathematical formula (it’s very simple so don’t panic) called Little’s Law. Little’s Law was first used to analyze the performance of telephone exchanges in 1969 by John Little.
Little’s law allows us to relate the mean number of items in the system in our case concurrent users with the mean time in the system (response time) as follows:

Number of Items in the system = Arrival Rate x Response Time

There is one rule to remember before you use little law you must make sure the system is balanced. That is the arrival rate into the system is the same at the exit rate.

I will begin with a none computer example the “Black Horse Pub” has a mean arrival rate of 5 customers per hour that stay for on average half an hour. Using little’s law we can calculate the mean number of customers in the pub as Arrival Rate x Response Time = 5 x 0.5 = 2.5 customers.

To apply little law to a performance test we must first make sure that we are taking measurements from when the system under test is balanced. Remember a balanced system the rate of work entering the system matches the rate of work leaving the system. This for a typical load testing tool is after the ramp up period and the number of virtual users remains constant and response times have stabilized and the transaction per second graph is level. To capture this period of time in LoadRunner for example you would need to select the time period in the Summary report filter or under the Tools -> Options.
So record the average response time for the transaction of interest and the number of times per second the transaction is executed.

So from the example above the response time is 43.613 seconds. The arrival rate is the number of transactions executed divided by the duration. The duration for this example was a 10 minute period as can be confirm by the LoadRunner summary below.

This gives you an arrival rate of 2.005 calculated by taking the count 1203 divided by the duration 600.

So the concurrent number of users waiting for a search to return is 87.44
There you go from your performance test results you can easily calculate the concurrency for a particular transaction.

Performance Test Monitoring

Introduction to Performance Monitoring

Performance monitoring is the process of collecting and analyzing the server data to compare the server statistics against the expected values. It helps the performance tester to have the health check of the server during the test. By monitoring the servers during the test, one can identify the server behavior for load condition and take steps to change the server behavior by adopting software or hardware performance tuning activities.

Each performance counter helps in identifying a specific value about the server performance. For example, % CPU Utilization is a performance counter which helps in identifying the utilization level of the CPU.

In a nutshell, server monitoring should provide information on four parameters of any system: Latency, Throughput, Utilization and Efficiency, which helps in answering the following questions.

Is your server available?
How busy is your CPU?
Is there enough Primary Memory (RAM)?
Is the disk fast enough?
Is there any other hardware issues?
Is the hardware issue result of software malfunctioning?

Key Performance Counters

Always start with few counters and once you notice a specific problem, start adding few more counters related to the symptom. Start monitoring the performance of the major resources like CPU, Memory, Disk or Network. This section provides you the details about key counters from each of above mentioned 4 areas which are very important for a Performance Tester to know.

Processor Bottlenecks

The bottlenecks related to processor (CPU) are comparatively easy to identify. The important performance counters that helps in identifying the processor bottleneck includes

% Processor Utilization (Processor_Total: % Processor Time) – This counter helps in knowing how busy the system is. It indicates the processor activity. It is the average percentage of elapsed time that the processor spends to execute a productive (non-idle) thread. A consistent level of more than 80% utilization (in case of single CPU machines) indicates that there is not enough CPU capacity. It is worth further investigation using other processor counters.

% User time (Processor_Total: % User Time) – This refers to the processor’s time spent in handling the application related processes. A high percentage indicates that the application is consuming high CPU. The process level counters needs to be monitored to understand which user process consumes more CPU.

% Privileged time (Processor_Total: % Privilege Time) – This refers to the processor’s time spent in handling the kernel mode processes. A high value indicates that the processor is too busy in handling other operating system related activities. It needs immediate attention from the system administrator to check the system configuration or service.

% I/O Wait (%wio - in case of UNIX platforms) – This refers to the percentage wait for completion of I/O activity. It is a good indication to confirm whether the threads are waiting for the I/O completion.

Processor Queue Length (System: Processor Queue Length) – This counter helps in identifying how many threads are waiting in queue for execution. A consistent queue length of more than 2 indicates bottleneck and it is worth investigation. Generally if the queue length is more than the number of CPUs available in the system, then it might reduce the system performance. A high value of % usr time coupled with high processor queue length indicates the processor bottleneck.

Other counters of interest:

Other counters like Processor: Interrupts per second, System: Context Switches per second can be used in case of any specific issues. Interrupts per second refers to the number of interrupts that the hardware devices sends to the processor. A consistent value of above 1000 in Interrupts per second indicates hardware failure or driver configuration issues. Context Switches refers to the switching of the processor from a lower priority thread to a high priority thread. A consistent value of above 15000 per second per processor indicates the presence of too many threads of same priority and possibility of having blocked threads.

Performance Monitoring in Windows platform

Windows operating system comes with a performance monitoring tool called Perfmon. This monitoring tool can be used to collect the server statistics during the performance test run. Normally, most of the performance testing tools have its own monitors to monitor the system resources of the system under test. In this case, it becomes easy to compare the system load related metrics with the system resource utilization metrics to arrive at a conclusion. Infact, any performance testing tool that monitors windows machine internally talks to perfmon to collect the system resource utilization details.

Post production Monitoring

There are lots of licensed tools available in the market which is used for post production monitoring. These tools are used to capture the web server traffic details and provide online traffic details. A very popular tool of this category is WebTrends. Many organizations uses this tool to get to the traffic trends of an application running in production environment. There are other tools like HP OpenView tools which run in production servers and monitor the server resource utilization levels. It provides alarming mechanism to indicate the heavy usage and provides easy bottleneck isolation capabilities. But due to the cost involved with these kinds of tools, most of small organizations don’t opt for them. But post production monitoring data would be of great use for designing realistic performance tests.

Benefits of Performance Monitoring

· Allows you to analyze and isolate the performance problem.

· Understand the resource utilization of the server and make best use of them.

· Plan for Capacity Planning activity based on the resource utilization level.

· Provides the server performance details offline (by creating a alert of sending a mail/message when resource utilization reaches the threshold value).

Tips to define the Performance Test Strategy

Best practices in defining performance test strategy

Most of the application would need Load tests, Stress tests and Stability(Endurance) tests to be planned before moving it to production. For data voluminous applications, there might be more emphasis on Volume testing and some applications like Auction site might emphasis on conducting spike tests with sudden spikes. For all other applications, its common that at least 3 cycles of load tests followed by stress test and endurance test run (at least for 12 hours) needs to be planned. (This is on a general case, but based on the application context, definitely there will be changes done to this approach).

For any application, irrespective of time availability , its always recommended to start with Baseline test. Baseline test is nothing but one user test which is run to identify the correctness of the test scripts and also it helps is checking whether the application meets the SLAs (Service Level Agreements) for 1 user load. This values can be used a benchmark for newer versions to compare ths performance improvement.

Next comes the Benchmark test for at least 15-20% of the target load. It helps in identifying the correctness of the test scripts and tests the readiness of the system before running target load tests.

Its a very good practice to always start with Baseline & Benchmark tests on any application before conducting the scheduled load tests. I would say, in fact its Scott barber's way of course.

When it comes to the scheduled tests - Load Tests, always plan to run at least 3 rounds of load test. Irrespective of doing a slow ramp , its advised to have 3 individual load tests for 50%, 75% and 100% target load. (The load level should be defined based on the system performance rather than just 50%, 75% & 100% target load levels).

Test Scenario - Have a slow ramp up followed by a stable period for at least an hour and ramp down. During this stable period, the target user load needs to perform various operations on the system with realistic think times. All the metrics measured should correspond only to the stable period and not during ramp up/ramp down period. Don't conclude any transaction response time just based on 1 or 2 iterations. The server should be monitored at least for a minimum of 5 iterations (at the same load level), before concluding the response time metrics, because there could be some reason for high /less response time at any single point of time. Thats the reason watch the server performance for at least 5 iterations at the same load level(during stable load period) and use the 90th percentile response time to report the response time metrics.

Test Metrics - Look for 90th percentile response time and standard deviation, of-course along with hits/sec and other server resource monitors. More the deviation(more than 1), more burst is the graph and hence its recommended to rerun the test.

Load Tests should be always followed by Stress Tests - Based on the Load test results, slowly increase the server load step by step and find out the server break point. For this test, realistic think time settings and cache settings are not required as the objective of this test to know the server break point and how it fails.

Endurance (Stability) Tests needs to be run at least for 10-12 hours in order to identify the memory bottlenecks. This need not be run for peak load, but it can be run for average load levels.