Friday, July 15, 2011

Response Time Vs Queue Length Vs Server Utilization

1. Is the system Response time directly / inversely proportional to the server utilization?
2. Will there be queues(for CPU/Memory/Disk) if Utilization is less than 100% ?


Actually speaking, there is no correlation between system response time and the server utilization. The Response time increase might happen even when the server is less utililised. (To illustrate this, look at the ATM example available in the PEA site - http://www.pea-online.com/resources.htm).

Most of us don’t understand this relationship. Increase in response time (during high load) is caused only because of queuing of the requests. The user request arrival pattern contributes to the high response time. Due to the adhoc user request arrival pattern, long queues are formed in various service centers which leads to high response time.

Response Time - What are the various components of Response time?

Waiting time (Queuing time) + Processing time = Response Time.

For a server, the processing time is always the same irrespective of load. For example, the server might take 2 seconds of processing during 1 user load and during 1000 users load for a transaction. But the waiting time would be high during 1000 users load, which drastically increases the overall response time. So, response time increases, if the waiting time increases which is caused by long queue.

Also, don’t think that there will be queues only if Utilization is more than 100%. There can be queues formed even when the Utilization is less. So, everything boils down to knowing the user arrival pattern of the system. This is going to determine, when the system is going to be loaded and break.

Hence, knowing the user arrival pattern of a system is very important rather than setting mere goals to perform load/stress test. Analyzing the user arrival pattern of an application would help in setting realistic goals which becomes more of organization logistics issue and most organizations bypass and pay for it later.

Symptoms for Application & Web Server Bottlenecks

Symptoms for Application server bottleneck
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. Increased 'Server Time' breakup
2. One or more page components of transaction takes more time where in the DB query is having less execution time.
3. The Static files are having less response time whereas the dynamic content (servlets, jsp, etc) takes more time.
4. Network delay is negligible.
5. Home Page gets displayed in few seconds even during the stress period(as it is fetched from the web server).
6. Hits / sec & Throughput remain less.
7. If the CPU/ Memory/Disk of the App server has any bottleneck symptoms.
8. If the HTTP / HTTPS connections established doesn’t increase proportionally with the load.
9. If the new connections established is very higher & the reused connections are very less.


Symptoms for Web server bottleneck
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. Increased 'Server Time' breakup
2. One or more page components of transaction takes more time where in the DB query is having less execution time.
3. The static files are having high response time than the dynamic contents (servlets, jsp, etc).
4. Network delay is negligible.
5. Home Page takes more time for display.
6. Hits /sec in the web server is very less.
7. If the CPU/ Memory/Disk of the web server has any bottleneck symptoms.

Code & Application server related performance issues in J2EE


In J2EE environment, there are some common Code related or Application server related problems. It include:

Code related problems:
~~~~~~~~~~~~~~~~~~
1. Slow Methods
a. Consistently Slow Methods
b. Intermittently Slow Methods
2. Synchronization Problems
3. Memory Problems
4. Coding Practices, such as using exceptions as a means to transfer control in the applications

Application server configuration problems:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. JDBC Connection Pool size
2. JVM Heap size
3. Thread Pool size

CPU Bottleneck Symptoms


Symptoms for CPU bottlenecks include the following:

The Processor(_Total)\% Processor Time(measures the total utilization of your processor by all running processes) will be high. If the server typically runs at around 70% or 80% processor utilization then this is normally a good sign and means your machine is handling its load effectively and not under utilized. Average processor utilization of around 20% or 30% on the other hand suggests that your machine is under utilized and may be a good candidate for server consolidation using Virtual Server or VMWare.

Further to breakdown this %processor Time , monitor the counters - Processor(_Total)\% Privileged Time and Processor(_Total)\% User Time, which respectively show processor utilization for kernel- and user-mode processes on your machine. If kernel mode utilization is high, your machine is likely underpowered as it's too busy handling basic OS housekeeping functions to be able to effectively run other applications. And if user mode utilization is high, it may be you have your server running too many specific roles and you should either beef hardware up by adding another processor or migrate an application or role to another box.

The
System\Processor Queue Length(indication of how many threads are waiting for execution) consistently greater than 2 or more for a single processor CPU is a clear indication of processor bottleneck . Also look at other counters like ASP\Requests Queued or ASP.NET\Requests Queued as well.

Disk Bottleneck Symptoms


A bottleneck from a disk can significantly impact response time for applications running on your system.
 
Physical Disk (instance)\Disk Transfers/sec counter for each physical disk and if it goes above 25 disk I/Os per second then you've got poor response time for your disk.

By tracking
Physical Disk(instance)\% Idle Time, which measures the percent time that your hard disk is idle during the measurement interval, and if you see this counter fall below 20% then you've likely got read/write requests queuing up for your disk which is unable to service these requests in a timely fashion. In this case it's time to upgrade your hardware to use faster disks or scale out your application to better handle the load.

Look for the
Physical Disk (instance)\Average Disk Queue length & Physical Disk (instance)\Current Disk Queue length parameters to get more details on the queued up requests.

Network Bottleneck Symptoms


The simplest way to measure effective bandwidth is to determine the rate at which your server sends and receives data. Network bandwidth availability is a function of the organization's network infrastructure. Network capacity is a function of the network cards and interfaces configured on the servers.

Network Interface: Bytes Total/sec : To determine if your network connection is creating a bottleneck, compare the Network Interface: Bytes Total/sec counter to the total bandwidth of your network adapter card. To allow headroom for spikes in traffic, you should usually be using no more than 50 percent of capacity. If this number is very close to the capacity of the connection, and processor and memory use are moderate, then the connection may well be a problem.

Web Service: Maximum Connections and Web Service: Total Connection Attempts : If you are running other services on the computer that also use the network connection, you should monitor the Web Service: Maximum Connections and Web Service: Total Connection Attempts counters to see if your Web server can use as much of the connection as it needs. Remember to compare these numbers to memory and processor usage figures so that you can be sure that the connection is the problem, not one of the other components.

To determine the throughput and current activity on a server's network cards, you can check the following counters:

· Network\Bytes Received/sec
· Network\Bytes Sent/sec
· Network\Bytes Total/sec
· Network Current Bandwidth

If the total bytes per second value is more than 50 percent of the total capacity under average load conditions, your server might have problems under peak load conditions. You might want to ensure that operations that take a lot of network bandwidth, such as network backups, are performed on a separate interface card. Keep in mind that you should compare these values in conjunction with Physical Disk\% Disk Time and Processor\% Processor Time. If the disk time and processor time values are low but the network values are very high, there might be a capacity problem. Solve the problem by optimizing the network card settings or by adding an additional network card. Remember, planning is everything—it isn't always as simply as inserting a card and plugging it into the network.

Learn about Performance Testing from Scott barber's Site

Who said Performance testing is just simply simulating more number of users to test the system performance.......

There are lots more than that......Words cant explain the worth of information available in Scott Barber's site.


Starting from how to plan for a Performance Test till how to create the Performance Test Report are available in a very easy & understandable way. Its amazing.
Thanks to Scott barber for sharing the information with us.