Microsoft Windows Experts

Windows Server 2003 (x86) Role based Performance Tuning October 16, 2011

The three typical significant roles that I encounter servers in are:

  •  File Server
  •  Domain Controller
  •  Terminal Server


There are obviously a large number of other roles such as IIS, Exchange, Hyper-V host, etc. but for each of these I think they are outnumbered vastly by the above roles so I am going to cover those – hopefully with the details of what the changes do, you can determine whether to test altering them on your other servers.


And yes, we are focusing mainly on x86 (32-bit) servers here – once you go 64-bit a lot of these changes become irrelevant as they ceiling is raised implicitly with the extra address space.


Get your priorities right


On the context menu of My Computer, click Properties

On the System Properties window presented, select the Advanced tab, click the Settings button under Performance

On the Performance Options window presented, select the Advanced tab

Here you will see Adjust for best performance of:

– Programs

– Background services


What this setting influences is the quantum used for thread execution – how much time they get to run on a processor without interruption from threads at the same or lower priority.


For programs to appear more responsive to the user, a shorter quantum is preferred, so more context switching occurs.

Server services prefer to run without being bothered with so many context switches, so prefer a longer quantum.


A Terminal Server hosts user sessions and has many processes directly accessed by interactive users, so should have the Programs radio button selected.

A file server or DC on the other hand has little direct user interaction, so we want to extend the quantum and optimize for Background services.


The other radio button selection relating to Memory Usage toggles LargeSystemCache off (tune for programs) and on (tune for system cache) – the default is enabled on Windows Server SKUs, but again Terminal Servers can be considered “multiple user desktop” servers and so would prefer to have the workstation default, to tune for programs instead.


Dipping in the pool

For all roles, it can be useful to have the Memory Manager more aggressive when it comes to trimming paged pool allocations – by default this occurs at the 80% watermark, but this can lead to the server being unable to satisfy requests before it gets round to cleaning up – so to reduce this watermark to 60% will make the housekeeping kick in earlier:

Path: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management

Name: PoolUsageMaximum


Data: 60 (decimal)


For Terminal Servers it is useful to have a paged pool that is as big as possible, while an algorithm at startup determines the size of the paged pool region we do have the option to indicate that we would like it to be given preference (at the cost of Page Table Entries, PTEs):

Path: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management

Name: PagedPoolSize


Data: ffffffff (hexdecimal)

(This is the same setting that we recommend to make if you are getting Srv 2020 events after trying the more aggressive trimming tweak above.)


Giving to the givers (File Server & DC specific)

When it comes to file servers and DCs specifically, we want to tune for the Server (LanmanServer) service to get some love as they will be receiving many SMB connections, this can be done through some registry tweaks:

Path: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters

Name: MaxWorkItems


Data: 65535 (decimal)

65,535 is the maximum you can set, and this value specifies the number of receive buffers that the Server service can allocate at any time – the default is a calculation made based on system resources during startup, so we are influencing this decision to suit our needs.


These values set the minimum and maximum number of preallocated connection objects respectively:

Path: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters

Name: MinFreeConnections


Data: 128 (decimal)


Path: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters

Name: MaxFreeConnections


Data: 1024 (decimal)

(This is the same setting that we recommend to make if you are getting Srv 2022 events.)


Terminal station (Terminal Server specific)

Terminal Servers act and should be treated more as “very busy clients” than servers – think about the probably of concurrent AD user logons, roaming or mandatory profile copying, files opened across the network, applications making connections to mail or database servers, and so on.


Resultant Set of Policy (RSoP) is useful for troubleshooting, but it can impact performance during “normal” operation, so it can be turned off by enabling the following group policy:

Computer Configuration / Administrative Templates / System / Group Policy / Turn off Resultant Set of Policy


Post-SP1 hotfix from KB319440 (rolled into SP2) gives control of buffering group policy reads which can improve logon times if concurrent logons are causing blocking operations when users are trying to access the same policies:

Path: HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Winlogon

Name: BufferPolicyReads


Data: 1


There is a Workstation (LanManWorkstation) service tweak which increases the number of concurrent outbound network calls:

Path: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters

Name: MaxCmds


Data: 2048 (decimal)


Also network related, this tweak makes Explorer more responsive by cutting down on the (metadata) information queries made when browsing network shares, especially those with many, many files or folders:

Path: HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Policies\Explorer

Name: NoRemoteRecursiveEvents


Data: 1

Name: NoRemoteChangeNotify


Data: 1

(This tweak can be pushed out to clients in a large environment as it applies to Explorer more than the concurrent user nature of Terminal Services.)


This is a very brief start at looking at what performance gains you might see on busy servers, or environments with slow/latent networks, or file servers with hundreds of thousands of files being browsed by multiple users.


Any of the registry values can be looked up on MSDN or TechNet if you’re interested in the “official” descriptions of what they do.

And as always, note any changes you make to server configurations and back up beforehand.


Windows XP Virtual Machines exhibiting degraded performance & high CPU utilization with Hyper-V & VDI April 16, 2011

I came across and issue where a customer had a VDI deployment of Windows XP machines running on Windows Server 2008 R2 Hyper-V servers. The problem was that while performance of the virtualized desktops was fine on one of the Hyper-V machines, it was not so good on the other.

In checking the performance of the Windows XP VM’s, it was found that CPU usage was significantly higher on the ones running on the problem server that the other. Perfmon showed the higher CPU usage, but was not conclusive as to a specific process causing the issue.

As we were looking for performance difference on VM’s running on the two different Hyper-V hosts, we collected Performance Monitor logs for the various Hyper-V counters. From these, we saw that the counter Hypervisor Virtual Processor\APIC TPR Accesses/sec was a flat line on the good performing server, but had a lot of ups and downs on the problem server.

TPR stands for Task Priority Register. It turns out that Windows XP and prior operating systems do very frequent access to the APIC TPR, which puts a bit more overhead on the hypervisor. Essentially, when the IRQL is raised, it immediately accesses the processor’s local APIC to set the interrupt mask. This prevents it from being pre-empted by a lower IRQL interrupt. And now, when the IRQL needs to be lowered, we need to access the local APIC again.

However, in Windows Server 2003 and later operating systems, we implement a concept known as Lazy IRQL. With this, when the IRQL is raised, the HAL keeps a note of the raised IRQL within a structure of it’s own. Now, if the processor gets a lower priority interrupt, the OS checks this with the locally stored IRQL and only then accessed the APIC to set the interrupt mask. In the newer versions of Windows, the duration for which an Interrupt Service Request (ISR) actually needs to run is very low. However this is not the case with Windows XP and earlier operating systems. An APIC TPR request must occur for every IRQL raise and also subsequent lowering. This additional overhead is small on an individual scale, but can add up quickly, and was in fact what was causing the VM’s to be slower. Now, this explains the poor performance for the Windows XP VM’s on one of the servers, but what about the other server that had good performance? Both of these servers were superficially identical, and were even running the same model of Intel processor.

Well, it turns out that Intel has a technology called vTPR that is designed to work around this type of issue. However, this technology is not present in all of their CPU models. Even though the processors in both servers were the same model, the stepping revision was not. The server exhibiting the performance issue had a stepping revision of B3, whereas the other server had a stepping of G0. It turns out that the processor with the stepping of B3 did not implement vTPR, but the one with stepping G0 did.

So, how do we work around the initial issue? Unfortunately there is not a simple software trick to get around this. The solution to this issue would be to either replace the processor with one which implements vTPR, or to upgrade the guest operating systems to something which supports Lazy IRQL; I recommend Windows 7. More info on Intel vTPR can be found in the this document.


Windows Performance Monitoring Concepts | Perfmon January 26, 2011

In analyzing the performance of a particular computer system with a given workload, we need to measure the following:

  • The capacity of those machines to perform this work
  • The rate at which the machines are currently performing it
  • The time it takes to complete specific tasks

Most computer performance problems can be analyzed in terms of resources, queues, service requests, and response time. This section defines these basic performance measurement concepts. It describes what they mean and how they are related. Two of the key measures of computer capacity are bandwidth and throughput. Bandwidth is a measure of capacity, which is the rate at which work can be completed, where as throughput measures the actual rate at which work requests are completed.

  • How busy the various resources of a computer system get is known as their utilization.
  • How much work each resource can process at its maximum level of utilization is defined as its capacity.

Introduction to the new Sysinternals tool: RAMMap January 23, 2011

RamMap is available from SysInternals at It allows us to examine detailed memory usage information in a way that is easy accessible.

Each tab has its own wealth of data, but I’ll be focusing on Use Counts and File Summary tab as they offer the information I think most people will be interested in.

Use Counts (more…)


How to Install Windows Performance Analyzer (WPA)

Installation Overview

Windows Performance Analyzer (WPA) is distributed as part of the Windows Performance Toolkit (WPT). WPT is installed as part of the entire SDK installation in the following versions of the SDK:

  • Microsoft Windows SDK for Windows 7 and .NET Framework 3.5 Service Pack 1
  • Microsoft Windows SDK for Windows Server 2008 and .NET Framework 3.5

Starting with the Microsoft Windows SDK for Windows 7 and .NET Framework 4, WPT can also be installed without requiring a full SDK installation. To install only WPT, follow these steps:

  1. From the Windows Performance Analysis Developers Center, download the Windows SDK for Windows 7 and .NET Framework 4 (or a later version of the SDK).
  1. When the Windows SDK Wizard starts, click Next until you reach the Installation Options page.
  1. On the Installation Options page, clear all options and then select Windows Performance Toolkit from the Common Utilities option.
  2. Click Next to continue with the installation of the selected SDK components.

For more information about how to acquire WPT, see Windows Performance Analysis Developers Center.

WPA Installation Files

The following installation files are in an .msi format:


Contains the WPA binary files for x86-based systems.


Contains the WPA binary files for x64-based systems.


Contains the WPA binary files for Itanium-based systems.

These installation files are located in the bin subdirectory of the SDK when WPT is installed from the followings versions of the SDK:

  • Windows SDK for Windows 7 and .NET Framework 3.5 Service Pack 1
  • Windows SDK for Windows Server 2008 and .NET Framework 3.5

Starting with the Windows SDK for Windows 7 and .NET Framework 4, these installation files are located in the Redist\Windows Performance Toolkit subdirectory of the SDK.

WPA Installation Instructions

The installation can be performed by double clicking the appropriate .msi file or manually running the installation file. For information on running the file manually, see the online MSDN documentation.

By default WPA installs in the \Program Files\Microsoft Windows Performance Analyzer directory. This path is automatically added to the system PATH variable. If you choose to install WPA in a folder other than the default folder, the system PATH variable must include your WPA executable directory.

WPA Installation on Windows XP SP2 and Windows Server 2003 SP1

WPA can be installed and used on Windows XP SP2 and Windows Server 2003 SP1 to gather trace information. Note that the stackwalk function is not available in these environments, because in Windows XP SP2 and Windows Server 2003 SP1 the required event gathering capabilities are not available. Furthermore, all operations that require trace decoding must be done on Vista or Windows Server 2008. This includes viewing traces in the Windows Performance Analyzer tool (Xperfview.exe).

In order to capture trace information on Windows XP SP2 or Windows Server 2003 SP1 take the following steps:

  1. From the Windows Performance Analyzer directory on a Windows Vista or Windows Server 2008 machine, copy Xperf.exe and Perfctrl.dll to a directory that is in the PATH environment variable of the Windows XP SP2 or Windows Server 2003 target system.
  1. On the Windows XP SP2 or Windows Server 2003 target system, run the trace using standard WPA syntax.
  1. Copy the “etl” files to a Windows Vista or Windows Server 2008 system that has a full installation of WPA.


Enabling Stack Walking on x64 Systems

Stackwalking on x64 systems requires the DisablePagingExecutive registry value to be set in HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management. For more information, see DisablePagingExecutive .

The following examples show how to set this value by using command scripts:

  • QueryStackwalk64.cmd:
    @REG QUERY “HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management” -v DisablePagingExecutive
  • TurnOnStackwalk64.cmd:
    @REG ADD “HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management” -v DisablePagingExecutive -d 0x1 -t REG_DWORD -f
    @IF NOT %ERRORLEVEL% == 0 echo error: Could not configure system for 64-bit stackwalking.  Please run this script from an elevated administrator console.
    Note  To make these changes effective, you must restart the system.
  • TurnOffStackwalk64.cmd:
    @REG ADD “HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management” -v DisablePagingExecutive -d 0x0 -t REG_DWORD -f
    @IF NOT %ERRORLEVEL% == 0 echo error: Could not remove 64-bit stackwalking configuration.  Please run this script from an elevated administrator console.
    Note  To make these changes effective, you must restart the system.