Performance

Examining Logon Durations with Control Up – Profile Load Time

2016-09-15
/ /
in Blog
/

Touching on my last point with an invalid AD Home Directory attribute, I decided to examine it in more detail as what is causing the slowness on logon.

60s

52 seconds for Profile Load time.  This is caused by this:

bad_attribute

The Home Folder server ‘WSAPVSEQ07’ I created that share and set the home directory to it.  I then simulated a ‘server migration’ by shutting this box down.  This means that the server responds to pings because it’s still present in DNS.

ping

But why does it take so long?

If I do a packet capture on the server I’m trying to launch my application from, and set it to trace the ip.addr of the ‘powered off’ server, here’s what we see:

slow-logon-packet-capture

The logon process attempts to query the server at the ‘28.9’ second mark and stops at the ‘73.9’ second mark.  A total of 45 seconds.  This is all lumped into the ‘User Profile Load Time’.  So what’s happening here?

We can see the initial attempt to connection occurs at 28.9 seconds, then 31.9 seconds, then finally 37.9 seconds.  The time span is 3s between the first and second try then 6s between the second and third try.  This is explained here.

In Windows 7 and Windows Server 2008 R2, the TCP maximum SYN retransmission value is set to 2, and is not configurable. Because of the 3-second limit of the initial time-out value, the TCP three-way handshake is limited to a 21-second timeframe (3 seconds + 2*3 seconds + 4*3 seconds = 21 seconds).

Is this what we are seeing?  It’s close.  We are seeing the first two items for sure, but then in instead of a ‘3rd’ attempt, it starts over but at the same formula.

= 45 seconds.

According to the KB article, we are seeing the Max SYN retransmissions (2) for each syn sent.  This article contains a hotfix we can install to change the value of the Max SYN retransmissions, but it’s a minimum of 2 which it’s set to anyways.  However, there is an additional hotfix to modify the 3 second time period.

The minimum I’ve found is we can reduce the 3 second time period to 100ms.

values

This reduces the logon time to:

faster_logon

19 seconds.

What does this packet capture look like?  Like this:
lower_syn_values

Even with the ‘Initial RTO’ set to 100ms, Windows has a MinRTO value of 300ms:

minrto

After the initial ‘attempt’ there is a 10-12 second delay.

Setting the MinRTO to the minimum 20ms

minrto_20

Reduces our logon time further now because our SYN packets are now about 200ms apart:

minrto_200ms

We are now 16 seconds, 13 seconds spent on the profile upon which 12 seconds was timing out this connection.

16s-logon

 

Would you implement this?  I would strongly recommend against it.  SYN’s were designed so blips in the network can be overcome.  Unfortunately, I know of no way to get around the ‘Home Directory responds to DNS but not to ping’ timeout.  The following group policies have no effect:

gpo1

gpo2

 

And I suspect the reason it has no effect is because the server still responds to DNS so the SYN sequence takes place and blocks regardless of this GPO settings.  Undoubtedly, it’s because this comes into play:
gpo_explain

Since our user has a home directory the preference is to ‘always wait for the network to be initialized before logging the user on’.  There is no override.

Is there a way to detect a dead home directory server during a logon?  Outside of long logon’s I don’t see any event logging or anything that would point us to determine if this is the issue without having to do a packet capture in a isolated environment.

 

 

Read More

ControlUp – Dissecting Logon Times a step further (invalid Home Directory impact)

2016-09-07
/ /
in Blog
/

Continuing on from my previous post, we were still having certain users with logons in the dozens of seconds to minutes.  I wanted to find out why and see if there is anything further that could be done.

60second_profile

 

After identifying a user with a long logon with ControlUp I ran the ‘Analyze Logon Duration’ script:

51-1second_profile

 

Jeez, 59.4 seconds to logon with 51.2 seconds of that spent on the User Profile portion.  What is going on?  I turned to process monitor to capture the logon process:

screen-shot-2016-09-07-at-8-24-18-pm

Well, there appears to be a 1 minute gap between the cmd.exe command from when WinLogon.exe starts it.  The stage it ‘freezes’ at is “Please wait for the user profile service”.

 

Since there is no data recorded by Process Monitor I tested by deleting the users profile.  It made no difference, still 60 seconds.  But, since I now know it’s not the user profile it must be something else.  Experience has taught me to blame the user object and look for network paths.  50 seconds or so just *feels* like a network timeout.  So I examined the users AD object:

screen-shot-2016-09-07-at-8-46-19-pm

 

Well well well, we have a path.  Is it valid?

screen-shot-2016-09-07-at-8-49-42-pm

 

 

It is not valid.  So is my suspicion correct?  I removed the Home Directory path and relaunched:

without_homedir_logon_time

Well that’s much, much better!

So now I want ControlUp to identify this potential issue.  Unfortunately, I don’t really see any events I can key in on that says ‘Attempting to map home drive’.  But what we can do is pull that AD attribute and test to see if it’s valid and let us know if it’s not.  This is the output I now generate:

new_script

 

I revised the messaging slightly as I’ve found the ‘Group Policy’ phase can be affected if GPP Drive Maps reference the home directory attribute as well.

 

So I took my previous script and updated it further.  This time with a check for valid home directories.  I also added some window sizing information to give greater width for the response as ‘Interim Delay’ was getting truncated when there were long printer names.  Here is the further updated script:

Read More

Citrix XenApp – Graphical Artifacts

2016-07-22
/ /
in Blog
/

In our Citrix XenApp 6.5 environment we started having a couple applications encounter an issue where they would experience some serious graphical artifacts.  What was supposed to look like this:

1

Would look like this:

2

 

Here’s a short video demonstrating this issue:

Or sometimes it would show the windows *behind* the artifacted image.  That is, instead of the ‘White’ you see in my image, the application behind it shows through.

When investigating this we found there was a couple symptoms that we were going to experience these artifacts.

  1. The window would become ‘frosted’ or ‘ghosted’ (as seen in Spy++ or AutoIt Window Info)
    4
  2. The application would switch to ‘Not Responding’
    3
  3. If you completed the task ‘Edit’ quickly there would be no artifacting (time was important)
  4. When ‘timing’ the switch from ‘normal’ to frosted or ghosted window it would be around 5-7 seconds.

So what’s going on here?

For this particular instance, the application is launching MSPaint with some modified properties.  It sets Paint to ‘Always On Top’, which in itself isn’t an issue, but then it purposefully locks the UI so you must complete the drawing and close paint before continuing.  This is how the vendor designed this application to operate with this workflow.

And what’s Windows doing?  It turns out, Windows is trying to alert you that your program is non-responsive!  Windows has a built in feature to ‘Frost/Ghost’ the window of a non-responsive UI to prevent you from entering input that won’t be received.  The ghosting effect is time sensitive!  So that explains why if we opened and closed our document quickly their would be no artifacting but if we manipulated it for some time the artifacts would appear.  The time limit for monitoring unresponsiveness is 5 seconds.  DWM.exe is the process responsible for creating the ‘Frost’ window and when responsive returns, it appears it does a poor job telling the application to repaint all affected Windows.

Microsoft recommends a couple ‘fixes’ which is really a programmatic way to ‘disable’ the ghosting feature.  The two methods Microsoft suggests is to create a NoGhost application compatibility fix or have the programmer use ‘DisableProcessWindowsGhosting’.  But there is a 3rd method.

The 5 second time limit is programmable.  If we extend the timeout we don’t need to configure ‘NoGhost’ compatibility fixes for each app or go back to the vendor.  The timer is global and affects every application and window.  Unfortunately, I know of no way to permanently disable it, but we can set a high enough value to prevent it from appearing.

So what do we have to do to ‘resolve’ this?

My preferred choice was to use Group Policy Preferences (Registry) and set a new value for HKCU\Control Panel\Desktop /v HungAppTimeout /t REG_SZ /d 120000

This sets the timeout to 2 minutes as opposed to 5 seconds.  Now when our program is used we get this result:

No More Artifacts.

When I was investigating this I found I could get the artifacting to occur in both ICA and RDP but not when on the console.  The frustrating thing about this issue is that it was not consistent.  Because of the 5 second default time limit, the program(s) we had that would ‘artifact’ would sometimes complete their UI locking job faster than 5 seconds, but sometimes not.  This lead to reports of artifacting occurring more often ‘during the peak work hours’ when the application/server/user load was the most.  This makes sense as the higher load undoubtedly lead to everything being slower, the database, server, etc. leading to the application waiting longer and thus exceeding the timeout.  I did find through the course of troubleshooting this issue that it seemed to ‘go away’ when I was trying to replicate after hours, which is frustrating to only have a slice of time to try and resolve this during peak hours.

Fortunately, after implementing the HungAppTimeout registry key the artifacts for several application ‘went away’.

Lastly, contrary to this article you do NOT need to restart for this value to take effect.  WinLogon.exe reads the HungAppTimeout value and then configures DWM accordingly when your profile loads.  So for this value to take effect you only need to log on with this value already residing in your user’s registry hive.

5

Read More

Citrix XenApp 6.5 – IMA errors galore, mfcom won’t start

2016-07-04
/ /
in Blog
/

I’ve seen this happen a few times now where the “Citrix Independent Management Architecture” (aka IMAService) won’t start, erroring with various errors:

All of these errors appear to be a registry with incorrect permissions configured on the Citrix keys.  Why did these keys get their permissions reset?  I’m unsure.  I DID just install Citrix UPM 5.4 which may reset the keys?

Here is how you fix the permissions (at least, everything I could possibly find):
1) Download SetACL.exe
2) Save this file to ‘CitrixRegPerms.txt’:

You may need to identify the local SID for ‘NETWORKSERVICE’.  In my example the value is:

 

You may need to replace your SID for NetworkService with the one from my file above.

Lastly this script will ‘fix’ the incorrect permissions:

Done.

 

Read More

Citrix PVS 7.7 VHD to VHDX performance difference

2016-06-13
/ /
in Blog
/

I was asked to justify upgrading our PVS vDisks to VHDX from VHD.  There are a few ‘feature’ / technical reasons:

  1. Use of native tools to mount/compress VHDX from Windows Server 2012.  VHD files created with 16MB block sizes require a custom Citrix tool which does not do compress.
  2. VHDX is the format ‘forward’.
  3. VHDX is supposed to perform better.

Test setup: VHD file is Citrix PVS 7.1SP2, VHDX file is a clone of the VHD with the tools upgraded to 7.7.

So I know VHDX is supposed to perform better but I was curious by how much.  Apparently, the modification Citrix made to the VHD format to a 16MB block size is ‘4K’ aligned as well.

fsutil fsinfo ntfsinfo C: reports the following for the different vDisks:

VHD_fsinfo

16MB block size VHD file

VHDX_fsinfo

32MB block size VHDX file

I set my target devices to the different vDisks and set the ‘Cache to RAM’ feature to 4096MB.  Ideally, all writes should be to RAM but this will still tax the filesystem.

And what is the performance between the two?  I used the DiskSpd utility from Microsoft to measure the differences.

VHD DiskSpd Test

VHD DiskSpd Test

VHDX DiskSpd Test

VHDX DiskSpd Test

Summary

Summary

The VHDX format appears to be around 7.5% faster in our setup.

The boot speed (the amount of time it takes the vDisk to power up and start the ‘Citrix PVS Device Service’) is even more dramatic:

VHD Boot Speed

VHD Boot Speed

 

VHDX Boot Speed

VHDX Boot Speed

How much of this is the tools vs the format?  I’m not sure, I didn’t have the time to reverse image and upgrade a VHD to 7.7.  Regardless, the combination of upgrading to 7.7 from 7.1SP2 AND the VHDX format brought a dramatic boot time improvement and consistently faster disk speed.

Read More

EventID 6003 – The winlogon notification subscriber was unavailable to handle a critical notification event.

2016-05-31
/ / /

We updated one of our Citrix XenApp servers and this message started flooding our Application event log:
“The winlogon notification subscriber <TrustedInstaller> was unavailable to handle a critical notification event.”

So what’s going on here?

Examining the registry on a ‘good’ working system and the ‘bad’ system revealed the following:

2

Good TrustedInstaller – No Error 6003

 

1

Bad TrustedInstaller – Error 6003

How did that value get there?

It turns out we installed Internet Explorer 11 with our patch cycle — but that in and of itself did not cause our issue.  Additional components for IE 11 were installed as well:
3

“Microsoft Windows English Spelling Package” and “Microsoft Windows English Hyphenation Package”

Neither of these packages are present on any of the ‘working’ systems.  I tested to determine which of them placed the registry key there…

It turns out both of them do.  If you uninstall *either* package it will remove the ‘CreateSession’ value.  Since these packages are not in our standard build we are removing both.

Read More

WSUS clients fail with WARNING: Exceeded max server round trips: 0x80244010

2016-03-14
/ / /

J C Hornbeck/Joe Tindale touched on this topic on a Microsoft blog post here.

In that post he touches on what’s actually happening:

Cause: The error, 0x80244010, means WU_E_PT_EXCEEDED_MAX_SERVER_TRIPS and happens when a client has exceeded the number of trips allowed to a WSUS server.  We have defined the maximum number of trips as 200 within code and it cannot reconfigured.  A “trip” to the server consist of the client going to the server and saying give me all updates within a certain scope.  The server will give the client a certain number of updates within this trip based on the size of the update metadata.  The server can send 200k worth of update metadata in a single trip so it’s possible that 10 small updates will fit in that single trip.  Other larger updates may require a single trip for each update if they exceed the 200k limit.  Due to the way Office updates are published you are more likely to see this error if you’re syncing Office updates since their metadata is typically larger in size.
I’ve bolded the more important information.  This is hardcoded and cannot be reconfigured.  This, to me, is a bit ridiculous that it can’t be reconfigured.
I have a WSUS client where we ‘reset’ the Windows Update client.  After resetting the client we were getting an error “WARNING: Exceeded max server round trips: 0x80244010”.  We would try multiple times but this error wouldn’t go away and prevented us from running Windows Update on this system.  So I started to investigate.  The first thing I did was finish reading that blog entry.  Hornbeck continues:
The client takes these new updates as they trickle down and inserts them into a small database to cache them for future use.  So during the first client synchronization with WSUS the client may get 75% of the available updates, put them into the database, and then fail at some point due to the number of max trips being exceeded.  The good news is the second synchronization cycle will not need to start from the beginning since the client has already cached 75% of the updates into its database.  The second cycle will pick up where it left off and most likely finish getting all the updates from the server.  There have been a few rare cases where a third scan cycle is needed but more often than not two is sufficient.
Again, I have bolded and underlined the important parts.
I started my investigation by trying to replicate the problem.  I started up Windows Update and ‘checked for updates’.
Ok, no problem.  So I checked again.
Well, Hornbeck/Tindale did say it may take a couple passes.  Let’s try again.
I’m getting worried now…  Let’s try a fourth time.
Hmmm…   At this point I wanted to better understand what Windows Update was doing.  I originally installed Wireshark to trace the conversation but it was difficult and time consuming to try and count the traffic back and forth to the WSUS server.  So I reverted my system and installed Fiddler2 on it.
From the video you can actually see the traffic from the WSUS server.  The request for the ‘Updates’ starts at item 3 and completes at 203.  Exactly 200 round trips.
Since my previous Windows Update attempts at the WSUS server failed after a few tries I thought I would trace the traffic with Fiddler for the multiple attempts.  My logic was I wanted to know if the traffic was ‘looping’; repeating itself and getting to the limit preventing updates.  Or, would each send/receive be unique and thus, simply, more is needed?
The first bit of the first run.  If the second run as identical or near identical traffic ‘packet sizes’ I would be concerned it’s looping…
I reset Fiddler and started the second run.  Completely different!
When I started the second run I was happy to see it was a completely different result.  I cleared Fiddler for a 3rd time ran the ‘Check for Updates’ until it timed out again, and cleared Fiddler again.  I then thought to just let Fiddler capture everything.  There really is no need to clear it each time.  I monitored the Fiddler output looking for loops or patterns.  The update check timed out a forth time (as before).  There was no looping I could see.
Finally, on the fifth run:
Updates!  We have updates!
In the end, when the Microsoft blog post was written (2008) there probably was only enough updates that two or three passes would go through all of them.  As time as gone on and more and more updates have been deployed to systems this hardcoded maximum is doing a huge disservice.  Our Windows 2008R2 SP1 systems require FIVE passes of clicking/waiting/clicking/waiting/etc.
A natural solution to this is to expose the “max server round trips” variable and allow it to be programmed by the organization according to their needs.  The present state of this issue is unnecessarily confusing and arbitrarily limited.
Read More

Help! My Citrix application is running slow (Meditech, Citrix, Imprivata)

2015-12-19
/ / /

We have an application (Meditech) that users have reported as performing poorly in Citrix.  They were able to confirm the poor performance by comparing it to a desktop that had the same software installed.  I was brought on to try and understand these performance differences and why Citrix would be performing so much worse.  To measure the performance, the user took me to a screen where they held down a key on the keyboard to increment a counter in the software.  When holding the key down on the desktop the counter was quick and incremented at a steady pace, on Citrix it seemed OK at first but then slowed as time went on.   To illustrate these performance differences, I made a video:

So we definitely have a problem.  My first steps on troubleshooting this problem was to compare the differences between the desktop and the VM.  The VM had server processors but they were older, and at less GHz at 2.6GHz for the server.  The desktop had 3.4GHz processors with a newer generation.  If the processing is NOT latent sensitive then the faster processor’s can make a difference, and the CPUMark had them at 1500 and 2100 (~40% difference).  At first glance, this seems like it could be the difference in our timings but it’s still too drastic at ~5s vs ~20s, a 400% difference.  To try and narrow the difference I took the application to a completely bare server with no software on it what-so-ever and reran the test.  It completed in about ~5-6s.  The processor it ran on was comparable to the original server processor but ran at 2.7GHz instead.  The processor was not running at nearly enough speed to make up the difference, something else must be consuming those cycles.

At this point I procmon’ed the benchmark and came up with the following:

(PID 6660 is the process that the benchmark runs against)

Very obviously, the pattern of each key stroke on the Citrix server is present with the initial pattern highlighted:

About 400ms from each key stroke to when the next one is registered.  So what is delaying the 400ms?  From my experience, unexplained delays are CPU related.  I then looked at the process activity summary to see if I can find the bottleneck:

Again, very obviously we are seeing the CPU ramp up, and it can also explain the faster, initial iterations of the GUI as the CPU ‘ramps’ up and doesn’t spike to the it’s maximum.  None of the other graphs show any activity at the time of the benchmark so the suspect is highly on the CPU.  When hovering over the graph we see the CPU percentage.

This is a 4 core box, so ~25% equals one full core for a single threaded application.  Again, this points to the application being bottlenecked by the processor, but again, the difference is too large to consider just the CPU at this point.  We need to find what part of the process is consuming these resources.  Fortunately, ProcExp (process explorer) can help us determine what is going on within a process.

I started a new run and got properties of the process:

MGUI.DLL is consuming all of the CPU.  That is a DLL utilized by the application, clicking on ‘Stack’ gives us the hierarchy of commands being utilized by that thread.

From this, I can understand that ntoskrnl.exe, ntdll.dll are native Microsoft Windows functions, MGUI.DLL is utilized by Meditech, but what is ISXHook.dll?

Doing a search within the process shows that it’s utilized by Imprivata, a single sign-on solution we utilize to try and increase user efficiency.  It works by ‘screen scraping’ to determine fields that it needs to populate with user credentials to try and speed up user logins.  Logically, this sounds like it could be causing the delay by screen scraping every time a key is stroked or a change is registered.  To confirm this, we need to remove Imprivata from the application.  Fortunately, it’s hooked in by services that can easily be terminated.

I’m going to terminate everything that says ‘Imprivata Inc.’

With the processes terminated I reran my test.  Using Process Explorer and getting properties on the process, immediately CPU usage went from 25% down to 15% at a peak:

And getting Stack information showed a much cleaner stack:

In conclusion, I may need to investigate into Imprivata to determine if I can reduce its polling rate or find some way of allowing it to ‘stop’ polling the CSMagic process *after* the accelerated login.  Its current settings (which I’m not familiar with, sadly) is not acceptable and causes a significant slow down.  Fortunately, the root cause has been determined and we can work towards a full resolution.

Read More

Citrix Receiver 4+ rants and “Your apps are not available at this time. Please try again in a few minutes or contact your help desk with this information”

2015-10-28
/ / /

The environment I’m working in is a mix of XenApp 6.5, 5.0 and Presentation Server 4.5.  The 6.5 and 5.0 farm also have a nearly identical test farm.  We’ve been migrating the applications off the Presentation Server farms and are moving them to XenApp 6.5.  At this point in the migration we have around 5-10 applications left on 4.5 to move, with around 400-450 on the 6.5 farms and probably around 20-30 or so on the 5.0 farms.  These farms utilize a Citrix Webinterface 5.4.2 frontend for web interface and PNA.

We have standardized the environment on mostly Citrix Receiver 3.3 and some 3.4.  Time has marched and we’ve been tasked with getting Receiver 4+ working the Windows 7/8/10 rollout.  We were not able to do so with the earlier versions of Receiver 4 because things like sort icons into custom folders on the desktop and Start Menu.  This feature came in around 4.2.  In our environment memberships to applications are granted through group membership and Citrix PNA allowed the user to ‘roam’ from computer to computer only displaying the applications they have access to, as opposed to a bunch of applications they do not have access with the onus on them to pick and choose the correct applications.

So we started work on planning this migration and have started with the latest greatest (as of today) Citrix Receiver 4.3.  We have been able to come close to simulating all the features of the Enterprise editions of 3.3 and 3.4.  Namely:

Citrix Receiver automatically connects and populates a folder (MyApps) in the Start Menu and on the desktop.
No self-service.  Applications are automatically presented to you and defined by Group Membership.
Single sign-on.  Receiver will take your Windows logon credentials to use for authentication.

We do have some outstanding items.  We’ve set the client to have all applications as ‘MANDATORY’ which does populate the applications in the MyApps folders; but applications marked as ‘Create shortcut on the desktop’ in the applications properties in AppCenter are not created.

Anyways, onto the problem.  Now that we have Receiver 4.3 setup, SSON working, PNA working, we logged onto our system and watched the applications populate.

Slowly.

Really Slowly.

Eventually a dialog popped up.

Then another, and another, and another.

And these dialogs are completely custom!  They are NOT native Windows dialogs!

So if you have multiple ones of them, sometimes clicking the X (close button) or OK doesn’t work because the dialog appears ‘modal’ and you need to click the button on the ‘active’ window.  But you generally don’t know which one that is so you have to go through and select each dialog from the task bar and try clicking ok until you magically get lucky and select the one that has priority.  Then you do it all over again as the next primary window *may not be the one on top*.

1 minute 24 seconds to populate 470 applications

Just for giggles, how fast does Receiver 3.3 populate the same list?

8 seconds.  And all the icons show up.

Alright, so you’ve passed that point and are now looking at your applications.  But they are missing icons!

But not all applications are missing their icons…  Only some.

So let’s find out what’s consuming all this time and maybe, just maybe, we’ll solve our “Your apps are not available” error message.

First thing we need to do is enable Citrix Receiver Logging:

Next is to exit and restart receiver and logs will start to generate.  They are located here:
%USERPROFILE%AppDataLocalCitrixReceiver
%USERPROFILE%AppDataLocalCitrixAuthManager
%USERPROFILE%AppDataLocalCitrixSelfService

The most important log tends to be the ‘SelfService.txt’ log.  If you search that log for the “Your apps are not available” error message it pops up in locations like this:

So this dialog popped up for an application called ‘BMTServe’.  And what does BMTServe look like?

Generic icon!

But BMTServe was not the only application that encountered this dialog.  From my video it popped up numerous times.  Searching the SelfService.txt file for ‘Your apps’ and looking for the application it references points to an application with a blank icon 100% of the time.  Not every application that produces a blank icon causes this prompt, as we literally have ~250 applications with blank icons and the dialog pops up anywhere from 0 to 10 times.  Sometimes it pops up 2 times, sometimes none, sometimes 10 times.

So, why are these icons blank?

Citrix Receiver 4.3 seems to only prefer 32bit icons or icons of a particular size.  I haven’t confirmed what exactly yet, but I do know that 8bit 32×32 icons don’t seems to get ‘translated’.  The Citrix logs all but confirm this as well.

I confirmed with Citrix that icons are required to be 32bit and the order they are checked is 48×48, 32×32 then 16×16.

This is how Receiver processes icons that are formatted correctly:

The icons were processed instantly. 00:00:00. But if they are formatted in a way that Receiver decides it needs to ‘reformat’ them:

This call to get an icon took 11.6 seconds!!! If it doesn’t get the icon formatted in the way it wants, it appears the SelfService.exe setups a queue of icons that it needs and ‘re-requests’ them from the server. Could it be that Receiver is submitting too many queries? The error mentions to check the authmansvr.txt log file. This log file shows the following:

The error appears to start at “CWindowsReceiver::CallARGetConnectedVpnGateway” When this call is successful it returns:

So, I guess it’s possible that trying to re-pull the icon data is causing authmansvr.exe to crash…?  Another crazy thing is I was attempting to automate this process of terminating Receiver and relaunching it to see if I could get a gauge on the frequency of this occurrence and this is what I saw:

Ok, I thought, not so bad.  Just two messages the first couple launches?  It shouldn’t be too much of an issue…  But then I looked at my application folder:

Left is when I get all my apps (and usually the message box) the left is all those ‘successes’

It appears Receiver removed all applications producing that dialog box.  When I was terminating and relaunching receiver it was ONLY populating 195 applications as opposed to 493 it was supposed to. No wonder I wasn’t getting any messages!  On a hunch, I looked at each of the 195 applications it kept and they all had good icons.  I then took a random sampling of about 30 of the 300 or so applications that it did not keep and none of them had proper icons, all blank.  So another bullet towards icons causing my issue.

Read More

Citrix XenApp, OpenGL pass-through and Nvidia GRID cards on Amazon EC2 (G2 Instances)

2015-09-01
/ / /

I’m attempting to do a Proof of Concept (POC) for a client and one of the ideas was to utilize the Amazon EC2 cloud to provide GPU instances to the users for their applications (Maya, SolidWorks, etc.).  In order to understand how GPU sharing works, I setup my home lab to take advantage of these features first, in order to understand how it operates.

Citrix provides documentation on setting up the GPU sharing.  For my test, I’m doing this on a bare metal Citrix server.  Essentially, the notes state that OpenGL is automatically shared and enabled and special steps must be taken for DirectX, OpenCL, CUDA and Windows Server 2012.  To enable GPU sharing for XenApp for these features, the following registry file will enable these:

Windows Registry Editor Version 5.00
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SOFTWARE\Citrix\CtxHook\AppInit_Dlls\Graphics Helper] “DirectX”=dword:00000001
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Citrix\CtxHook\AppInit_Dlls\Graphics Helper] “DirectX”=dword:00000001
[HKEY_LOCAL_MACHINE\SOFTWARE\Citrix\CtxHook\AppInit_Dlls\Multiple Monitor Hook] “EnableWPFHook”=dword:00000001
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Citrix\CtxHook\AppInit_Dlls\Multiple Monitor Hook] “EnableWPFHook”=dword:00000001
[HKEY_LOCAL_MACHINE\SOFTWARE\Citrix\CtxHook\AppInit_Dlls\Graphics Helper] “CUDA”=dword:00000001
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Citrix\CtxHook\AppInit_Dlls\Graphics Helper] “CUDA”=dword:00000001
[HKEY_LOCAL_MACHINE\SOFTWARE\Citrix\CtxHook\AppInit_Dlls\Graphics Helper] “OpenCL”=dword:00000001
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Citrix\CtxHook\AppInit_Dlls\Graphics Helper] “OpenCL”=dword:00000001

In addition to this registry file, for Server 2012, the following Group Policy object is required:

  • On Windows Server 2012, Remote Desktop Services (RDS) sessions on the RD Session Host server use the Microsoft Basic Render Driver as the default adapter. To use the GPU in RDS sessions on Windows Server 2012, enable the Use the hardware default graphics adapter for all Remote Desktop Services sessions setting in the group policy Local Computer Policy > Computer Configuration > Administrative Templates > Windows Components > Remote Desktop Services > Remote Desktop Session Host > Remote Session Environment.

My initial setup is a Q87M-E system with a Intel 4771 and onboard graphics.  My system is setup with Windows 2012 R2 with Citrix XenApp 7.6.

Launching an ICA session to the XenApp 7.6 server results in:

 

 

We have OpenGL working, DirectX 11, and OpenCL (the onboard Intel GPU’s do not support CUDA).  So we have a full, working implementation of GPU sharing in a ICA session on a XenApp server.

But the onboard Intel graphics card will not get the me the performance I want.  I had a Nvidia GTX 670 video card on hand to see if I can get better 3D performance.  I installed that card in the system, installed the video drivers and checked the results.

 

Where did my OpenGL go?  Everything else is working correctly; Direct3D, CUDA, OpenCL, but not OpenGL.  My understanding from Nvidia is that OpenGL should just be ‘passed through’ by Citrix.  I know that it *does* pass-through because we, literally, just saw it with the onboard Intel GPU and the Intel drivers.

My next thought is maybe it had to do with the drivers?  Maybe if I tried the Quadro drivers?  It turns out Nvidia has released special Quadro drivers that enable OpenGL in a RDP session. Maybe if I modified the INF to add my GTX670 to these special drivers I could get OpenGL to work?

 

It did not work.  OpenGL remained disabled in RDP/ICA sessions.

Suspecting Nvidia is doing some form of detection that is disabling OpenGL (it’s probably considered a ‘pro’-feature) I acquired a Quadro FX5800 and using the *same* modified Quadro drivers, these were my results:

 

OpenGL is now working!!

Ok, so, at this point I know how to enable GPU sharing for Citrix XenApp, I know how to check and verify it’s functionality, and I know that different Nvidia cards can have OpenGL enabled or disabled but am not sure if it’s the driver that matters or the hardware.  If it’s the hardware I’m a bit surprised Intel would incorporate hardware accelerated OpenGL into ICA sessions for their consumer pieces but Nvidia would not for their discrete cards.  To *attempt* to test this I went and got the oldest driver I could find that would support a FX5800:

Sure enough, it works.

My last thought is maybe Nvidia has it hard coded somewhere to check for a string or a specific ‘type’ of video card and, if found, enable OpenGL?

My thinking is that the Nvidia drivers are doing some kind of detection and making a determination between a console session and all others.  If I’m lucky, maybe they only implemented this in their *newer* drivers, maybe after they started the RDS OpenGL acceleration…

To test this theory I went and grabbed the oldest driver I could find for my GTX 670 that would work on Windows 2012R2.  327.23.

 

Well now…  OpenGL is working.  This is interesting.  And leads evidence that OpenGL is being disabled in ICA via the driver.  I attempted to find when OpenGL *stopped* working.

331.82 –> Works, and now with OpenGL 4.4

337.88  -> Works

340.52 -> No OpenGL.  This driver (340.52) is now the first gaming driver *After* the “OpenGL on RDS release” (340.43).  It appears something on or after the 340.XX branch is disabling OpenGL in ICA sessions.

At the same time I was testing my Nvidia gaming GPU on my home lab, I was testing Amazon.  The GPU instance that Amazon provides utilize the Nvidia GRID K520 card as a vGPU.  This card is marketed as a ‘GRID Gaming‘ card.  I setup this instance with Citrix XenApp and, at the time, used the latest driver (347.70).  At the time of this testing, this was my 3rd rebuild of this instance so I went with Server 2008 because my previous 2 builds were 2012 and I was convinced I was doing something wrong.  The OS shouldn’t matter, but I’m noting it here.

347.70 –> No OpenGL (just like the gaming card):

Knowing that downgrading the gaming card’s driver worked, I installed the oldest driver I could for the K520:

320.59 –> OpenGL Works!

Just like the gaming card.  I suspect the K520 will have the same issue as the GTX 670, and that any driver after 340.XX will disable OpenGL in a ICA session.  Unfortunately, the Grid K520 appears to only have 3 drivers to chose from, 320.59, 335.35, and 347.70.  To finish this testing I will test with 335.35:

OpenGL Works!  So it appears driver 340 and newer will disable OpenGL for ICA sessions across various types of Nvidia GPU’s, but not Quadro’s..

If you want OpenGL to work on Amazon EC2 instances, you must (at the time of this writing…  hopefully Nvidia corrects this over sight for all cards – consumer and not) you must use a driver older than 340.

Read More