CPU Bottleneck Symptoms:
Symptoms for CPU bottlenecks include the following; The Processor(_Total)\% Processor Time (measures
the total utilization of your processor by all running processes) will be high.
If the server typically runs at around 70% or 80% processor utilization then
this is normally a good sign and means your machine is handling its load
effectively and not underutilized. Average processor utilization of around 20%
or 30% on the other hand suggests that your machine is underutilized and may be
a good candidate for server consolidation using Virtual Server or VMWare.
Further to breakdown this %processor Time, monitor the counters – Processor(_Total)\% Privileged Time and Processor(_Total)\% User Time,
which respectively show processor utilization for kernel- and user-mode
processes on your machine. If kernel mode utilization is high, your machine is
likely underpowered as it’s too busy handling basic OS housekeeping functions
to be able to effectively run other applications. And if user mode utilization
is high, it may be you have your server running too many specific roles and you
should either beef hardware up by adding another processor or migrate an application
or role to another box. The System\Processor
Queue Length(indication of how many threads are waiting for execution)
consistently greater than 2 or more for a single processor CPU is a clear indication
of processor bottleneck. Also look at other counters like ASP\Requests Queued or ASP.NET\Requests Queued as well.
Tips to find out Application server bottlenecks:
- A
high increase in application server processing time when the load is increased.
- One
or more page components take more time when the same request db call is taking less execution time.
- The
Static files are having less response time whereas the dynamic contents (servlets,
jsp, etc) take more time.
- Network
delay is negligible.
- Home
Page gets displayed in few seconds even during the stress period (as it is fetched
from the web server).
- Hits/sec & Throughput
remains less.
- If
the CPU/ Memory/Disk of the App server has any bottleneck symptoms.
- If
the HTTP / HTTPS connections established doesn’t increase proportionally with
the load.
- If
the new connections established is very higher & the reused connections are
very less.
Tips to find out
Web server bottlenecks:
- Increased
‘Server Time’ breakup
- One
or more page components of transaction takes more time where in the DB query is having less execution time.
- The
static files are having high response time than the dynamic contents (servlets,
jsp, etc)
- Network
delay is negligible.
- Home
Page takes more time for display.
- Hits/sec
in the web server is very less.
- If
the CPU/ Memory/Disk of the web server has any bottleneck symptoms.
Hardware Malfunctioning Symptoms:
- System\Context Switches/sec
(measures how frequently the processor has to switch from user- to kernel-mode
to handle a request from a thread running in user mode). If this counter
suddenly starts increasing however, it may be an indicating of a
malfunctioning device, especially if you are seeing a similar jump in the Processor(_Total)\Interrupts/sec counter
on your machine.
- You
may also want to check Processor(_Total)\%
Privileged Time Counter and see if this counter shows a similar unexplained
increase, as this may indicate problems with a device driver that is causing an
additional hit on kernel mode processor utilization.
- If
Processor(_Total)\Interrupts/sec
does not correlate well with System\Context
Switches/sec however, your sudden jump in context switches may instead mean
that your application is hitting its scalability limit on your particular
machine and you may need to scale out your application (for example by
clustering) or possibly redesign how it handles user mode requests. In any case,
it’s a good idea to monitor System\Context Switches/sec over a period of time
to establish a baseline for this counter, and once you’ve done this then create
a perfmon alert that will trigger when this counter deviates significantly from
its observed mean value.
Memory Bottleneck Symptoms:
When it comes to the System memory, there are 3 things to
monitor:
- Monitor
Cache (Hits/Misses),
- Monitor
Memory (Memory Available/sec, Process/Working Set),
- Monitor
Paging (Pages Read/Sec, Pages Input/Sec, Page Faults/Sec, % Disk Processing)Memory\Available Bytes,
If this counter is greater than 10% of the actual RAM in your
machine then you probably have more than enough RAM and don’t need to worry.
The Memory\Pages/sec counter
indicates the number of paging operations to disk during the measuring interval,
and this is the primary counter to watch for indication of possible insufficient
RAM to meet your server’s needs. You can monitor Process(instance)\Working Set for each process instance to
determine which process is consuming larger and larger amounts of RAM. Process(instance)\Working Set measures
the size of the working set for each process, which indicates the number of
allocated pages the process can address without generating a page fault. A
related counter is Memory\Cache Bytes,
which measures the working set for the system i.e. the number of allocated
pages kernel threads can address without generating a page fault. Finally,
another corroborating indicator of insufficient RAM is Memory\Transition Faults/sec, which measures how often recently
trimmed page on the standby list are re-referenced. If this counter slowly
starts to rise over time then it could also indicating you’re reaching a point
where you no longer have enough RAM for your server to function well.
Disk Bottleneck Symptoms:
A bottleneck from a disk can significantly impact response
time for applications running on your system. Physical Disk (instance)\Disk Transfers/sec counter for each
physical disk and if it goes above 25 disk I/Os per second then you’ve got poor
response time for your disk. By tracking Physical
Disk(instance)\% Idle Time, which measures the percent time that your hard
disk is idle during the measurement interval, and if you see this counter fall
below 20% then you’ve likely got read/write requests queuing up for your disk
which is unable to service these requests in a timely fashion. In this case
it’s time to upgrade your hardware to use faster disks or scale out your
application to better handle the load. Look for the Physical Disk (instance)\Average Disk Queue length & Physical Disk
(instance)\Current Disk Queue length parameters to get more details on the
queued up requests.
Network Performance/Bottlenecks:
The first step in monitoring is to monitor the network
performance, to make sure your network performance is good. There are some
simple ways to do so. First monitor whether you are getting the same bandwidth
which you are supposed to get. The easiest way to find out is to check the
current bandwidth counter with your expected bandwidth. Also verify the rate at
which the server sends and receives the data. Network performance depends on 2
factors, network cards and interfaces (Switches/Routers) configured on the
servers.
Here are some of the counters to find network bottlenecks:
Network Interface: Current Bandwidth
This counter determines your current bandwidth of the network
interface. Capture this counter value and correlate with bytes receives/sec, bytes send/sec and bytes total/sec.
If the bytes total should be at least half of your total
bandwidth .If not so then we can confirm a network bottle neck
Network Interface: Bytes Total/sec:
To determine if your network connection is creating a
bottleneck, compare the Network Interface: Bytes Total/sec counter to the total
bandwidth of your network adapter card. To allow headroom for spikes in
traffic, you should usually be using no more than 50 percent of capacity. If
this number is very close to the capacity of the connection, and processor and
memory use are moderate, then the connection may well be a problem. To
determine the network utilization (throughput on a server’s network cards), you
can check the following counters:
- Network\Bytes
Received/sec
- Network\Bytes
Sent/sec
- Network\Bytes
Total/sec
- Network
Current Bandwidth
If the total byte per second value is more than 50 percent of
the total network utilization under average user/work load, then your server is
having some problems under peak load conditions. Make sure you compare network
counter values with Physical Disk\% Disk Time and Processor\% Processor Time utilization.
If the disk time and processor time values are low but the network values are
very high, there might be a problem with your network. There are 2 ways to
solve this problem:
1. By
optimizing the network card settings
2. By
adding an additional network card.