I came across and issue where a customer had a VDI deployment of Windows XP machines running on Windows Server 2008 R2 Hyper-V servers. The problem was that while performance of the virtualized desktops was fine on one of the Hyper-V machines, it was not so good on the other.
In checking the performance of the Windows XP VM’s, it was found that CPU usage was significantly higher on the ones running on the problem server that the other. Perfmon showed the higher CPU usage, but was not conclusive as to a specific process causing the issue.
As we were looking for performance difference on VM’s running on the two different Hyper-V hosts, we collected Performance Monitor logs for the various Hyper-V counters. From these, we saw that the counter Hypervisor Virtual Processor\APIC TPR Accesses/sec was a flat line on the good performing server, but had a lot of ups and downs on the problem server.
TPR stands for Task Priority Register. It turns out that Windows XP and prior operating systems do very frequent access to the APIC TPR, which puts a bit more overhead on the hypervisor. Essentially, when the IRQL is raised, it immediately accesses the processor’s local APIC to set the interrupt mask. This prevents it from being pre-empted by a lower IRQL interrupt. And now, when the IRQL needs to be lowered, we need to access the local APIC again.
However, in Windows Server 2003 and later operating systems, we implement a concept known as Lazy IRQL. With this, when the IRQL is raised, the HAL keeps a note of the raised IRQL within a structure of it’s own. Now, if the processor gets a lower priority interrupt, the OS checks this with the locally stored IRQL and only then accessed the APIC to set the interrupt mask. In the newer versions of Windows, the duration for which an Interrupt Service Request (ISR) actually needs to run is very low. However this is not the case with Windows XP and earlier operating systems. An APIC TPR request must occur for every IRQL raise and also subsequent lowering. This additional overhead is small on an individual scale, but can add up quickly, and was in fact what was causing the VM’s to be slower. Now, this explains the poor performance for the Windows XP VM’s on one of the servers, but what about the other server that had good performance? Both of these servers were superficially identical, and were even running the same model of Intel processor.
Well, it turns out that Intel has a technology called vTPR that is designed to work around this type of issue. However, this technology is not present in all of their CPU models. Even though the processors in both servers were the same model, the stepping revision was not. The server exhibiting the performance issue had a stepping revision of B3, whereas the other server had a stepping of G0. It turns out that the processor with the stepping of B3 did not implement vTPR, but the one with stepping G0 did.
So, how do we work around the initial issue? Unfortunately there is not a simple software trick to get around this. The solution to this issue would be to either replace the processor with one which implements vTPR, or to upgrade the guest operating systems to something which supports Lazy IRQL; I recommend Windows 7. More info on Intel vTPR can be found in the this document.