In analyzing the performance of a particular computer system with a given workload, we need to measure the following:
- The capacity of those machines to perform this work
- The rate at which the machines are currently performing it
- The time it takes to complete specific tasks
Most computer performance problems can be analyzed in terms of resources, queues, service requests, and response time. This section defines these basic performance measurement concepts. It describes what they mean and how they are related. Two of the key measures of computer capacity are bandwidth and throughput. Bandwidth is a measure of capacity, which is the rate at which work can be completed, where as throughput measures the actual rate at which work requests are completed.
- How busy the various resources of a computer system get is known as their utilization.
- How much work each resource can process at its maximum level of utilization is defined as its capacity.
The key measures of the time it takes to perform specific tasks are queue time, service time, and response time. The term latency is often used in an engineering context to refer to either service time or response time. Response time will be used consistently here to refer to the sum of service time and queue time. In networks, another key measure is round trip time, which is the amount of time it takes to send a message and receive a confirmation message (called an Acknowledgement, or ACK for short) in reply.
Queue Length: When a work request arrives at a busy resource and cannot be serviced immediately, the request is queued. Queued requests are subject to a queue time delay before they are serviced. The number of requests that are delayed waiting for service is known as the queue length.
Bandwidth measures the capacity of a link, bus, channel, interface, or the device itself to transfer data. Bandwidth is usually measured in either bits/second or bytes/second (where there are 8 bits in a data byte). For example, the bandwidth of a 10BaseTEthernet connection is 10 megabits per second (Mbps). Bandwidth usually refers to the maximum theoretical data transfer rate of a device under ideal operating conditions.
Throughput measures the rate that work requests are completed, from the point of view of some observer. Examples of throughput measurements include the number of reads per second from the disk or file system, the number of instructions per second executed by the processor.
Note: Throughput and bandwidth are very similar. Bandwidth is often construed as the maximum capacity of the system to perform work, whereas throughput is the current observed rate at which that work is being performed.
Utilization measures the fraction of time that a device is busy servicing requests, usually reported as a percent busy. Utilization of a device varies from 0 through 1, where 0 is idle and 1 (or 100 percent) represents utilization of the full bandwidth of the device. It is customary to report that the processor or CPU is 75 percent busy, or the disk is 40 percent busy. It is not possible for a single device to ever be greater than 100 percent busy.
If an application server is currently processing 60 transactions per second with a CPU utilization measured at 20 percent, the server apparently has considerable reserve capacity to process transactions at an even higher rate. On the other hand, a server processing 60 transactions per second running at a CPU utilization of 98 percent is operating at or near its maximum capacity.
Service time measures how long it takes to process a specific customer work request.
Response time is the sum of service time and queue time:
Response time = service time + queue time
Response time includes both the device latency and any queuing delays that occur while the request is queued waiting for the device.
When a work request arrives at a busy resource and cannot be serviced immediately, the request is queued. Requests are subject to a queue time delay once they begin to wait in a queue before being serviced.
When a work request to access a shared resource that is already busy servicing another request occurs, the operating system queues the request and queue time begins to accumulate. The one aspect of sharing resources that is not totally transparent to programs executing under Windows Server 2003 is the potential performance impact of resource sharing. Queuing delays occur because shared resources have multiple applications attempting to access these resources in parallel. Significant delays at a constrained resource are apt to become visible. If there is significant contention for a shared resource because two or more programs are attempting to use it at the same time, performance might suffer.
Tip: Consider the time you spend waiting in line in your car at a tollbooth. The amount of time it takes you to pay the toll is often insignificant compared to the time you spend waiting in line. The amount of time spent waiting in line to have your order taken and filled at a fast-food restaurant during the busy lunchtime period is often significantly longer than the time it takes to process your order. Similarly, queuing delays at an especially busy shared computer resource can be prolonged. It is important to monitor the queues at shared resources closely to identify periods when excessive queue time delays are occurring.
Not all computer resources are shared on Windows Server 2003, which means that these unshared devices have no queuing time delays. Input devices like the mouse and keyboard, for example, are managed by the operating system so that they are accessible by only one application at a time. These devices are buffered to match the speed of the people operating them because they are capable of generating interrupts faster than the application with the current input focus can process their requests. Instead of queuing these requests, however, the operating system device driver routines for the keyboard and mouse discard extraneous interrupts. The effect is that little or no queue time delay is associated with these devices.
One of the most effective methods used to tune performance is systematically to identify bottlenecked resources and then work to remove or relieve them. When the throughput of a particular system reaches its effective capacity limits, the system is said to be bottlenecked. The resource bottleneck is the component that is functioning at its capacity limit. The bottlenecked resource can also be understood as the resource with the fastest growing queue as the number of users increases.
A balanced system is one in which no resource saturates before any other as the load increases, and all resource queues grow at the same rate. In a balanced system, queue time delays are minimized across all resources, leading to performance that is optimal for a given configuration and workload.
Once you identify a bottlenecked resource, you should follow a systematic approach to relieve that limit on performance and permit more work to get done. You might consider these approaches, for example:
- Optimizing the application so that it runs more efficiently (that is, utilizes less bandwidth) against the specific resource.
- Upgrading the component of the system that is functioning at or near its effective bandwidth limits so that it runs faster.
- Balancing the application across multiple resources by adding more processors, disks, network segments, and so on, and processing it in parallel.
I hope you’ve enjoyed the first part in our series of blogs on Windows Performance Monitoring Concepts. In the later parts, we’re going to emphasize on in-depth bottlenecks review procedures.