We have an application (Meditech) that users have reported as performing poorly in Citrix. They were able to confirm the poor performance by comparing it to a desktop that had the same software installed. I was brought on to try and understand these performance differences and why Citrix would be performing so much worse. To measure the performance, the user took me to a screen where they held down a key on the keyboard to increment a counter in the software. When holding the key down on the desktop the counter was quick and incremented at a steady pace, on Citrix it seemed OK at first but then slowed as time went on. To illustrate these performance differences, I made a video:
So we definitely have a problem. My first steps on troubleshooting this problem was to compare the differences between the desktop and the VM. The VM had server processors but they were older, and at less GHz at 2.6GHz for the server. The desktop had 3.4GHz processors with a newer generation. If the processing is NOT latent sensitive then the faster processor’s can make a difference, and the CPUMark had them at 1500 and 2100 (~40% difference). At first glance, this seems like it could be the difference in our timings but it’s still too drastic at ~5s vs ~20s, a 400% difference. To try and narrow the difference I took the application to a completely bare server with no software on it what-so-ever and reran the test. It completed in about ~5-6s. The processor it ran on was comparable to the original server processor but ran at 2.7GHz instead. The processor was not running at nearly enough speed to make up the difference, something else must be consuming those cycles.
At this point I procmon’ed the benchmark and came up with the following:
|(PID 6660 is the process that the benchmark runs against)|
Very obviously, the pattern of each key stroke on the Citrix server is present with the initial pattern highlighted:
About 400ms from each key stroke to when the next one is registered. So what is delaying the 400ms? From my experience, unexplained delays are CPU related. I then looked at the process activity summary to see if I can find the bottleneck:
Again, very obviously we are seeing the CPU ramp up, and it can also explain the faster, initial iterations of the GUI as the CPU ‘ramps’ up and doesn’t spike to the it’s maximum. None of the other graphs show any activity at the time of the benchmark so the suspect is highly on the CPU. When hovering over the graph we see the CPU percentage.
This is a 4 core box, so ~25% equals one full core for a single threaded application. Again, this points to the application being bottlenecked by the processor, but again, the difference is too large to consider just the CPU at this point. We need to find what part of the process is consuming these resources. Fortunately, ProcExp (process explorer) can help us determine what is going on within a process.
I started a new run and got properties of the process:
MGUI.DLL is consuming all of the CPU. That is a DLL utilized by the application, clicking on ‘Stack’ gives us the hierarchy of commands being utilized by that thread.
From this, I can understand that ntoskrnl.exe, ntdll.dll are native Microsoft Windows functions, MGUI.DLL is utilized by Meditech, but what is ISXHook.dll?
Doing a search within the process shows that it’s utilized by Imprivata, a single sign-on solution we utilize to try and increase user efficiency. It works by ‘screen scraping’ to determine fields that it needs to populate with user credentials to try and speed up user logins. Logically, this sounds like it could be causing the delay by screen scraping every time a key is stroked or a change is registered. To confirm this, we need to remove Imprivata from the application. Fortunately, it’s hooked in by services that can easily be terminated.
|I’m going to terminate everything that says ‘Imprivata Inc.’|
With the processes terminated I reran my test. Using Process Explorer and getting properties on the process, immediately CPU usage went from 25% down to 15% at a peak:
And getting Stack information showed a much cleaner stack:
In conclusion, I may need to investigate into Imprivata to determine if I can reduce its polling rate or find some way of allowing it to ‘stop’ polling the CSMagic process *after* the accelerated login. Its current settings (which I’m not familiar with, sadly) is not acceptable and causes a significant slow down. Fortunately, the root cause has been determined and we can work towards a full resolution.