Citrix

Citrix Workspace Environment Manager – First Impressions

2017-03-02
/ /
in Blog
/

Citrix Workspace Environment Manager can be used as a replacement for Active Directory (AD) Group Policy Preferences (GPP).  It does not deal with machine policies, however.  Because of this AD Group Policy Objects (GPO) are still required to apply policies to machines.  However, WEM’s goal isn’t to manipulate machine policies but to improve user logon times by replacing the user policy of an AD GPO.  A GPO has two different engines to apply settings.  A Registry Policy engine and the engine that drives “Client Side Extensions” (CSE).  The biggest time consumer of a GPO is processing the logic of a CSE or the action of the CSE.  I’ll look at each engine and what they mean for WEM.

Registry Extension

The first is the ‘Registry’ policy engine.  This engine is confusingly called the “Registry Extension” as opposed to the CSE “Group Policy Registry”.  The “Registry Extension” is engine applies registry settings set in via ‘Administrative Templates’.

These settings are ‘dumb’ in that there is no logic processing required.  When set to Enabled or Disabled whatever key needs to be set with that value is applied immediately.  Processing of this engine occurs very, very fast so migrating these policy settings would have minimal or no improvement to logon times (unless you have a ton of GPO’s apply and network latency becomes your primary blocker).

If you use ControlUp to Analyze GPO Extension Load Times it will display the Registry Extension and the Group Policy Registry CSE:

Client Side Extensions

However, CSE’s allow you to put complex logic and actions within that require processing to determine if a setting should be applied or how a settings should be applied.  One of the most powerful of these is the Registry CSE.  This CSE allows you to apply registry settings with Boolean logic and can be filtered on a huge number of variables.

All of this logic is stored in a XML document that is pulled when the group policy object is processed.  This file is located in “C:\ProgramData\Microsoft\Group Policy\History\GUID OF GPO\SID OF USER\Preferences\Registry”.

Parsing and executing the Boolean logic takes time.  This is where we would be hopeful that WEM can make this faster.  The processing of all this, in our existing environment consumes the majority of our logon time:

Migrating Group Policy Preferences to WEM

Looking at some of our Registry Preferences we’ll look at what is required to migrate it into WEM.

Basic settings “eg, ‘Always applied’”.

“Visual Effects”

These settings have no filters and are applied to all users.  To migrate them to WEM I’ve exported these values and set them into a registry file:

Switching to WEM I select ‘Actions’ then ‘Registry Entries’ and then I imported the registry file.

An interesting side note, it appears the import excluded the REG_BINARY.  However you can create the REG_BINARY via the GUI:

To set the Registry Entries I created a filter condition called “Always True”

And then created a rule “Always True”

We have a user group that encompasses all of our Citrix users upon which I added in ‘Configure Users’.  Then, during the assignment of the registry keys I selected the ‘Always True’ filter:

And now these registry keys have been migrated to WEM.  It would be nice to ‘Group’ these keys like you can do for a collection in Group Policy Preferences.  Without the ability to do so the name of the action becomes really important as it’s the only way you can filter:

Next I’ll look at replacing Group Policy Preferences that contain some boolean logic.

Read More

Tracing Citrix Provisioning Service (PVS) Target Device Boot Performance – Process Monitor

2017-01-31
/ /
in Blog
/

Non-Persistent Citrix PVS Target Devices have more complicated boot processes then a standard VM.  This is because the Citrix PVS server components play a big role in acting as the boot disk.  They send UDP packets over the network to the target device.  This adds a delay that you simply cannot avoid (albeit, possibly a small one but there is no denying network communication should be slower than a local hard disk/SSD).

One of the things we can do is set the PVS target devices up in such a way that we can get real, measurable data on what the target device is doing while it’s booting.  This will give us visibility into what we may actually require for our target devices.

There are two programs that I use to measure boot performance.  Windows Performance Toolkit and Process Monitor.  I would not recommend running both at the same time because the logging does add some overhead (especially procmon in my humble experience).

The next bit of this post will detail how to offline inject the necessary software and tools into your target device image to begin capturing boot performance data.

Process Monitor

For Process Monitor you must extract the boot driver and inject the process monitor executable itself into the image.

To extract the boot driver simple launch process monitor, under the Options menu, select ‘Enable Boot Logging’

Then browse to your C:\Windows\System32\Drivers folder, and with “Show Hidden Files” enabled, copy out Procmon23.sys

It might be a good idea to disable boot logging if you did it on your personal system now 🙂

 

Now we need to inject the follow registry entry into our image:

Here are the steps in action:

Seal/promote the image.

On next boot you will have captured boot information:

To see how to use Windows Performance Toolkit for boot tracing Citrix PVS Target Device’s click here.

Read More

Tracing Citrix Provisioning Service (PVS) Target Device Boot Performance – Windows Performance Toolkit

2017-01-31
/ /
in Blog
/

Non-Persistent Citrix PVS Target Devices have more complicated boot processes then a standard VM.  This is because the Citrix PVS server components play a big role in acting as the boot disk.  They send UDP packets over the network to the target device.  This adds a delay that you simply cannot avoid (albeit, possibly a small one but there is no denying network communication should be slower than a local hard disk/SSD).

One of the things we can do is set the PVS target devices up in such a way that we can get real, measurable data on what the target device is doing while it’s booting.  This will give us visibility into what we may actually require for our target devices.

There are two programs that I use to measure boot performance.  Windows Performance Toolkit and Process Monitor.  I would not recommend running both at the same time because the logging does add some overhead (especially procmon in my humble experience).

The next bit of this post will detail how to offline inject the necessary software and tools into your target device image to begin capturing boot performance data.

Windows Performance Toolkit

For the Windows Performance Toolkit it must be installed on the image or you can copy the files from an install to your image in the following path:

To offline inject, simply mount your vDisk image and copy the files there:

 

Then the portion of it that we are interested in is “xbootmgr.exe” (aka boot logging).  In order to enable boot logging we need to inject the following registry key into our PVS Image:

Seal/promote the image.

On next boot you will have captured boot information:

To see how to use Process Monitor for boot tracing Citrix PVS Target Device’s click here.

Read More

Lets Make PVS Target Device Booting Great Again (Part 2)

2017-01-05
/ /
in Blog
/

Continuing on from Part 1, we are looking to optimize the PVS boot process to be as fast as it possibly can be.  In Part 1 we implemented Jumbo Frames across both the PVS target device and the PVS server and discovered that Jumbo Frames only applies to the portion where BNIStack kicks in.

In this part we are going to examine the option “I/O burst size (KB)”  This policy is explained in the help file:

I/O burst size — The number of bytes that will be transmitted in a single read/write transaction before an ACK is sent from the server or device. The larger the IO burst, the faster the throughput to an individual device, but the more stress placed on the server and network infrastructure. Also, larger IO Bursts increase the likelihood of lost packets and costly retries. Smaller IO bursts reduce single client network throughput, but also reduce server load. Smaller IO bursts also reduce the likelihood of retries. IO Burst Size / MTU size must be <= 32, i.e. only 32 packets can be in a single IO burst before a ACK is needed.

What are these ACK’s and can we see them?  We can.  They are UDP packets sent back from the target device to the PVS server.  If you open Procmon on the PVS server and startup a target device an ACK looks like so:

These highlighted 48byte UDP Receive packets? They are the ACKS

And if we enable the disk view with the network view:

 

With each 32KB read of the hard disk we send out 24 packets, 23 at 1464 bytes and 1 at 440 bytes.  Add them all together and we get 34,112 Bytes of data.  This implies an overall overhead of 1344 bytes per sequence of reads or 56 bytes per packet.  I confirmed it’s a per-packet overhead by looking at a different read event at a different size:

If we look at the first read event (8,192) we can see there is 6 packets, 5 at 1464 and one at 1208, totaling 8528 bytes of traffic.  8528 – 8192 = 336 bytes of overhead / 6 packets = 56 bytes.

The same happens with the 16,384 byte read next in the list.  12 packets, 11 at 1464 and one at 952, totaling 17,056.  17056 – 16384 = 672 bytes of overhead / 12 packets = 56 bytes.

So it’s consistent.  For every packet at the standard 1506 MTU you are losing 3.8% to overhead.  But there is secretly more overhead than just that.  For every read there is a 48 byte ACK overhead on top.  Admittedly, it’s not much; but it’s present.

And how does this look with Jumbo Frames?

For a 32KB read we satisfied the request in 4 packets.  3 x 8972 bytes and 1 at 6076 bytes totalling 32,992 bytes of transmitted data.  Subtracting the transmitted data from what is really required 32,992-32,768 = 224 bytes of overhead or…  56 bytes per packet 🙂

This amounts to a measly 0.6% of overhead when using jumbo frames (an immediate 3% gain!).

But what about this 32KB value.  What happens if we adjust it longer (or shorter)?

Well, there is a limitation that handicaps us…  even if we use Jumbo Frames.  It is stated here:

IO Burst Size / MTU size must be <= 32, i.e. only 32 packets can be in a single IO burst before a ACK is needed

Because Jumbo Frames don’t occur until after the BNIStack kicks in, we are limited to working out this math at the 1506 MTU size.

The caveat of this is the size isn’t actually the MTU size of 1506.  The math is based on the data that fits within, which is 1464 bytes.  Doing the math in reverse gives us 1464 x 32 = 45056 bytes.  This equals a clear 44K (45056 /1024) maximum size.  Setting IO/Burst to 44K and the target device still boots.  Counting the packets, there are 32 packets.

So if we up the IO/Burst by 1K to 45K (45*1024 = 46,080 bytes) will it still boot?

It does not boot.  This enforces a hard limit of 44K for I/O Burst until the 1st stage supports a larger MTU size.  I have only explored EFI booting, so I suppose it’s possible another boot method allows for larger MTU?

The reads themselves are split now, hitting the ‘version’ and the ‘base’ with the base being 25,600 + 20,480 for the version (46,080 bytes).  I believe this is normal for versioning though.

So what’s the recommendation here?

Good question.  Citrix defaults to 32K I/O Burst Size.  If we break the operation of the burst size we have 4 portions:

  1. Hard drive read time
  2. Packet send time
  3. Acknowledgement of receipt
  4. Turnaround time from receipt to next packet send

The times that I have for each portion at a 32K size appear to be (in milliseconds):

  1. 0.3
  2. 0.5
  3. 0.2
  4. 0.4

A total time of ~1.4ms per read transaction at 32K.

For 44K I have the following:

  1. 0.1
  2. 0.4
  3. 0.1
  4. 0.4

For a total time of ~1.0ms per read transaction at 44K.

I suspect the 0.4ms difference could be well within a margin of error of my hand based counting.  I took my numbers from a random sampling of 3 transactions, and averaged.  I cannot guarantee they were at the same spot of the boot process.

However, it appears the difference between them is close to negligible.  Question that must be posed is what’s the cost of a ‘retry’ or a missed or faulty UDP packet?  From the evidence I have it should be fairly small, but I haven’t figured out a way to test or detect what the turnaround time of a ‘retry’ is yet.

Citrix has a utility that gives you some information on what kind of gain you might get.  It’s called ‘Stream Console’ and it’s available in the Provisioning Services folder:

 

With 4K I/O burst it does not display any packets sent larger because they are limited to that size

 

8K I/O Burst Size. Notice how many 8K sectors are read over 4K?

 

16K I/O Burst Size

 

What I did to compare the differences in performance between all the I/O Burst Size options is I simply tried each size 3 times and took the results as posted by the StatusTray utility for boot time.  The unfortunate thing about the Status Tray is that it’s time/throughput calculations are rounded to the second.  This means that the Throughput isn’t entirely accurate as a second is a LARGE value when your talking about the difference between 8 to 9 seconds.  If you are just under or over whatever the rounding threshold is it’ll change your results when we start getting to these numbers.  But I’ll present my results anyways:

To me, the higher value of I/O Burst Size the better the performance.  

Again, caveats are that I do not know what the impact of a retry is, but if reading from the disk and resending the packet takes ~1ms then I imagine the ‘cost’ of a retry is very low, even with the larger sizes.  However, if your environment has longer disk reads, high latency, and a poor network with dropped or lost packets then it’s possible, I suppose, that higher I/O burst is not for you.

But I hope most PVS environments are something better designed and you actually don’t have to worry about it.  🙂

Read More

Lets Make PVS Target Device Booting Great Again (Part 1)

2016-12-30
/ /
in Blog
/

Some discussions have swirled recently about implementing VDI.  One of the challenges with VDI are things like slow boot times necessitating having machines pre-powered on, requiring a pool of machines sitting using server resources until a logon request comes in and more machines are powered on to meet the demand…  But what if your boot time is measured in the seconds?  Something so low you could keep the ‘pool’ of machines sitting on standby to 1 or 2 or even none!

I’m interested in investigating if this is possible.   I previously looked at this as a curiosity and achieved some good results:

 

However, that was a non-domain Server 2012 R2 fresh out of the box.  I tweaked my infrastructure a bit by storing the vDisk on a RAM Disk with Jumbo Frames (9k) to supercharge it somewhat.

Today, I’m going to investigate this again with PVS 7.12, UEFI, Windows 10, on a domain.  I’ll show how I investigated booting performance and see what we can do to improve it.

The first thing I’m going to do is install Windows 10, join it to the domain and create a vDisk.

Done.  Because I don’t have SCVMM setup on my home lab I had to muck my way to enabling UEFI HDD boot.  I went into the PVS folder (C:\ProgramData\Citrix\Provisioning Services) and copied out the BDMTemplate_uefi.vhd to my Hyper-V target Device folder

I then edited my Hyper-V Target Device (Gen2) and added the VHD:

I then mounted the VHD and modified the PVSBOOT.INI file so it pointed to my PVS server:

 

 

I then created my target device in the PVS console:

 

And Viola!  It Booted.

 

And out of the gate we are getting 8 second boot times.  At this point I don’t have it set with a RAM drive or anything so this is pretty stock, albeit on really fast hardware.  My throughput is crushing my previous speed record, so if I can reduce the amount of bytes read (it’s literally bytes read/time = throughput) I can improve the speed of my boot time.  On the flip side, I can try to increase my throughput but that’s a bit harder.

However, there are some tricks I can try.

I have Jumbo Frames enabled across my network.  At this stage I do not have them set but we can enable them to see if it helps.

To verify their operation I’m going to trace the boot operation from the PVS server using procmon:

We can clearly see the UDP packet size is capping out at 1464 bytes, making it 1464+ 8 byte UDP header + 20 byte IP header = 1492 bytes.  I enabled Jumbo Frames

Under Server Properties in the PVS console I adjusted the MTU to match the NIC:

 

You then need to restart the PVS services for it take effect.

I then made a new vDisk version and enabled Jumbo Frames in the OS of the target device.  I did a quick ping test to validate that Jumbo Frames are passing correctly.

I then did started procmon on the PVS server, set the target device to boot…

and…

 

1464 sized UDP packets.  A little smaller than the 9000 bytes or so it’s supposed to be.  Scrolling down a little futher, however, shows:

 

Notice the amount of UDP packets sent in the smaller frame size?

 

Approximately 24 packets until it gets a “Receive” notification to send the next batch of packets.  These 24 packets account for ~34,112 bytes of data per sequence.  Total time for each batch of packets is 4-6ms.

If we follow through to when the jumbo frames kick in we see the following:

This is a bit harder to read because the MIO (Multiple Input Output) kicks in here and so there are actually two threads executing the read operations as opposed to the single thread above.

Regardless, I think I’ve hit on a portion that is executing more-or-less sequentially.  The total amount of data being passed in these sequences is ~32,992 bytes but the time to execute on them is 1-2ms!  We have essentially doubled the performance of our latency on our hard disk.

So why is the data being sent like this?  Again, procmon brings some visibility here:

Each “UDP Receieve” packet is a validation that the data it received was good and instructs the Sream Process to read and send the next portion of the file on the disk.  If we move to the jumbo frame portion of the boot process we can see IO goes all over the place in size and where the reads are to occur:

So, again, jumbo frames are a big help here as all requests under 8K can be serviced in 1 packet, and there are usually MORE requests under 8K then above.  Fortunately, Procmon can give us some numbers to illustrate this.  I started and stopped the procmon trace for each run of a Network Boot with Jumbo Frames and without:

Standard MTU (1506)

 

Jumbo Frame MTU (9014)

 

The number we are really after is the 192.168.1.88:6905.  The total number of events are solidly in half with the number of sends about a 1/3 less!  It was fast enough that it was able to process double the amount of data in Bytes sent to the target device and bytes received from the target device!

Does this help our throughput?  Yes, it does:

 

“But Trentent!  That doesn’t show the massive gains you are spewing!  It’s only 4MB/s more in Through-put!”

And you are correct.  So why aren’t we seeing more gains?  The issue lies with how PVS boots.  It boots in two stages.  If you are familiar with PVS on Hyper-V from a year ago or more you are probably more aware of this issue.  Essentially, PVS breaks the boot into the first stage (bootloader stage) which starts in, essentially, a lower-performance mode (standard MTU).  Once the BNIStack loads it kicks into Jumbo Packet mode with the loading of the Synthetic NIC driver.  The benefits from Jumbo Frames doesn’t occur until this stage.  So when does Jumbo Frames kick in?  You can see it in Event Viewer.

From everything I see with Procmon, first stage boot ends on that first Ntfs event.  So out of the original 8 seconds, 4 is spent on first stage boot where Jumbo Packets are not enabled.  Everything after there is impacted (positively).  So for our 4 seconds “standard MTU” boot, bringing that down by a second is a 25% improvement!  Not small potatoes.

I intend to do more investigation into what I can do to improve boot performance for PVS target devices so stay tuned!  🙂

Read More

Citrix Netscaler 11.1 Unified Gateway and a non-working Citrix HTML5 Receiver

2016-12-06
/ /
in Blog
/

We setup a Citrix Unified Gateway for a proof of concept and were having an issue getting the HTML5 Receiver to connect for external connections.  It was presenting an error message: “Citrix Receiver cannot connect to the server”.  We followed this documentation.  It states, for our use case:

What would probably help is having a proxy that can parse out all websocket traffic and convert to ICA/CGP traffic without any need of changes to XA/XD. Netscaler Gateway does exactly this… …NetScaler Gateway also doesn’t need any special configuration for listening over websockets…

Connections via NetScaler Gateway:

…When a gateway is involved, the connections are similar in all Receivers. Here, Gateway acts as WebSocket proxy and in-turn opens ICA/CGP/SSL native socket connections to backend XenApp and XenDesktop. …

…So using a NetScaler Gateway would help here for ease of deployment. Connection to gateway is SSL/TLS and gateway to XenApp/XenDesktop is ICA/CGP….

And additional documentation here.

WebSocket connections are also disabled by default on NetScaler Gateway. For remote users accessing their desktops and applications through NetScaler Gateway, you must create an HTTP profile with WebSocket connections enabled and either bind this to the NetScaler Gateway virtual server or apply the profile globally. For more information about creating HTTP profiles, see HTTP Configurations.

Ok.  So we did the following:

  1. We enabled WebSocket connections on Netscaler via the HTTP Profiles
  2. We configured Storefront with HTML5 Receiver and configured it for talking to the Netscaler.

And then we tried launching our application:

We started our investigation.  The first thing we did was test to see if HTML5 Receiver works at all.  We configured and enabled websockets on our XenApp servers and then logged into the Storefront server directly, and internally.  We were able to launch applications without issue.

The second thing we did was enable logging for HTML5 receiver:

To view Citrix Receiver for HTML5 logs

To assist with troubleshooting issues, you can view Citrix Receiver for HTML5 logs generated during a session.

  1. Log on to the Citrix Receiver for Web site.
  2. In another browser tab or window, navigate to siteurl/Clients/HTML5Client/src/ViewLog.html, where siteurlis the URL of the Citrix Receiver for Web site, typically http://server.domain/Citrix/StoreWeb.
  3. On the logging page, click Start Logging.
  4. On the Citrix Receiver for Web site, access a desktop or application using Citrix Receiver for HTML5.

    The log file generated for the Citrix Receiver for HTML5 session is shown on the logging page. You can also download the log file for further analysis.

This was the log file it generated:

The “Close with code=1006” seemed to imply it was a “websocket” issue from google searches.

 

The last few events prior to the error are “websocket” doing…  something.

I proceeded to spin up a home lab with XenApp and a Netscaler configured for HTML5 Receiver and tried connecting.  It worked flawlessly via the Netscaler.  I enabled logging and took another look:

So there is a lot of differences but we focus on the point of failure in our enterprise netscaler we see it seems to retry or try different indexes (3 in total, 0, 1 and 2).

So there is a lot of evidence that websockets seem to be culprit.  We have tried removing Netscaler from the connection picture by connecting directly to Storefront and HTML5 receiver works.  We have configured both Netscaler and Storefront (with what we think) is a correct configuration.  And still we are getting a failure.

I opened up a call to Citrix.

It was a fairly frustrating experience.  I had tech’s ask me to go to “Program Files\Citrix\Reciever” and get the receiver version (hint, hint, this does not exist with HTML5).  I captured packets of the failure “in motion” and they told me, “it’s not connecting to your XenApp server”.  — Yup.  That’s the Problem.

It seems that HTML5 is either so new (it’s not now), so simple (it’s not really), or tech’s are just poorly trained.  I reiterated to them “why does it make 3 websocket connections on the ‘bad’ netscaler? Why does the ‘good’ netscaler appear to connect the first time without issue?”  I felt the tech’s ignore and beat around the bush regarding websockets and more focus put on the “Storefront console”.  Storefront itself was NOT logging ANYTHING to the event logs.  Apparently this is weird for a storefront failure.  I suspected Storefront was operating correctly and I was getting frustrated we weren’t focusing on what I suspected was the problem (websockets).  So I put the case on hold so I could focus on doing the troubleshooting myself instead of going around in circles on setting HTML5 to “always use” or “use only when native reciever is not detected”.

Reviewing the documentation for the umpteenth time this “troubleshooting connections” tidbit came out:

Troubleshooting Connections:

In cases where you are not able to connect some of the following points might help in finding out the problem. They also can be used while opening support case or seeking help in forums:

1) Logging: Basic connection related logs are logged by Receiver for HTML5 and Receiver for Chrome.

2) Browser console logs: Browsers would show errors related to certificates or network related failures here.

  • Receiver for HTML5: Open developer tools for HDX session browser tab. Tip: In Windows, use F12 shortcut in address bar of session page.

  • Receiver for Chrome: Go to chrome://inspect, click on Apps on left side. Click on inspect for Citrix Receiver pages (Main.html, SessionWindow.html etc)

The browser may show a log?  I wish I would have thought of that earlier.  And I wish Citrix would have put that in the actual “Receiver for HTML5” documentation as opposed to buried in a blog article.

So I opened the Console in Chrome, launched my application and reviewed the results:

We finally have some human readable information.

Websocket connections are failing during the handshake “Unexpected response code: 302”

What heck does 302 mean?  I installed Fiddler and did another launch withe Fiddler tracing:

 

I highlighted the area where it tells us it’s attempting to connect with websockets.  We can see in the response packet we are getting redirected, that’s what ‘302’ means.  I then found a website that lets you test your server to ensure websockets are working.  I tried it on our ‘bad’ netscaler:

 

Hitting ‘Connect’ left nothing in the log.  However, when I tried it with my ‘good’ netscaler…

 

It works!  So we can test websockets without having to launch and close the application over and over…

 

So we started to investigate the Netscaler.  We found numerous policies that did URL or content redirection that would be taking place with the packet formulated like so.  We then compared our Netscaler to the one in my homelab and did find one, subtle difference:

The one on the left is showing a rule for HTTP.REQ.URL.CONTAINS_ANY(“aaa_path”) where the one on the right just shows “is_vpn_url”.  Investigating further it was stated that our team was trying to get AAA authentication working and this was an option set during a troubleshooting stage.  Apparently, it was forgotten or overlooked when the issue was resolved (it was not applicable and can be removed).  So we set it back to having the “is_vpn_url” and retried…

It worked!  I tried the ‘websockets.org’ test and it connected now!  Looking in the Chrome console showed:

 

Success!  It doesn’t pause on the websocket connection and the console logging shows some interesting information.  Fiddler, with the working connection, now displayed the following:

Look!  A handshark response!

 

So, to review what we learned:

  1. Connections via Netscaler to HTML5 reciever do NOT require  (but is possible) a SSL connection on each target XenApp device
  2. Connection via Netscaler work over standard port (2598/1494) and do not require any special configuration on your XenApp server.
  3. You can use ‘http://www.websocket.org/echo.html’ to test your Netscaler to ensure websockets are open and working.
  4. Fiddler can tell you verbose information on your websocket connection and their contents.
  5. The web browser’s Javascript console is perfect to look at verbose messages in HTML5.

 

And with that, we are working and happy, Good Night!

Read More

AppV5 – Citrix User Profile Manager exclusions

2016-10-21
/ /
in Blog
/

The Citrix User Profile Manager (UPM) needs a little configuration tweaking to work with AppV specifically it requires:

 You must exclude the following item using Profile management exclusions:
 Profile Management\File system\Exclusion list\directories:
  • AppData\Local\Microsoft\AppV
 If you don’t exclude these items, App-V applications work the first time users access them but they fail, with an error, on subsequent logons.

But what happens when you *don’t* exclude this directory?

We upgraded our Citrix UPM to 5.4.1 and in that process we moved from setting our inclusions/exclusions via the ini file to using Group Policy.  The original thought was simply adding the exclusions would add them to the existing list of default inclusions/exclusions which already has this directory set.  This line of thinking was incorrect.  Citrix’s documentation states:

Important: If you use Group Policy rather than the .ini file (or you are rolling out a Group Policy deployment after a successful test with the .ini file), note that, unlike the installed .ini file, no items are included or excluded by default in the .adm or .admx file. This means you must add the default items manually to the file.

When we enabled Group Policy for the exclusions and set the path (for something unrelated to AppV) then it was the ONLY item being excluded from AppV and we were having the issue described by Citrix.  Our application would launch the first time, or oddly, just for the user on that specific server.  When they launched it again on another server it would fail until their user profile was deleted from the profile share.

I setup AppV5 debug logging and traced a launch of what this failure looked like when our user tried to start an AppV application:

The lesson?

If you are using a Profile Manager ensure your exclusions for AppV are applied correctly!  If you miss you may run into this weird behaviour.

Read More

Examining Logon Durations with Control Up – Profile Load Time

2016-09-15
/ /
in Blog
/

Touching on my last point with an invalid AD Home Directory attribute, I decided to examine it in more detail as what is causing the slowness on logon.

60s

52 seconds for Profile Load time.  This is caused by this:

bad_attribute

The Home Folder server ‘WSAPVSEQ07’ I created that share and set the home directory to it.  I then simulated a ‘server migration’ by shutting this box down.  This means that the server responds to pings because it’s still present in DNS.

ping

But why does it take so long?

If I do a packet capture on the server I’m trying to launch my application from, and set it to trace the ip.addr of the ‘powered off’ server, here’s what we see:

slow-logon-packet-capture

The logon process attempts to query the server at the ‘28.9’ second mark and stops at the ‘73.9’ second mark.  A total of 45 seconds.  This is all lumped into the ‘User Profile Load Time’.  So what’s happening here?

We can see the initial attempt to connection occurs at 28.9 seconds, then 31.9 seconds, then finally 37.9 seconds.  The time span is 3s between the first and second try then 6s between the second and third try.  This is explained here.

In Windows 7 and Windows Server 2008 R2, the TCP maximum SYN retransmission value is set to 2, and is not configurable. Because of the 3-second limit of the initial time-out value, the TCP three-way handshake is limited to a 21-second timeframe (3 seconds + 2*3 seconds + 4*3 seconds = 21 seconds).

Is this what we are seeing?  It’s close.  We are seeing the first two items for sure, but then in instead of a ‘3rd’ attempt, it starts over but at the same formula.

= 45 seconds.

According to the KB article, we are seeing the Max SYN retransmissions (2) for each syn sent.  This article contains a hotfix we can install to change the value of the Max SYN retransmissions, but it’s a minimum of 2 which it’s set to anyways.  However, there is an additional hotfix to modify the 3 second time period.

The minimum I’ve found is we can reduce the 3 second time period to 100ms.

values

This reduces the logon time to:

faster_logon

19 seconds.

What does this packet capture look like?  Like this:
lower_syn_values

Even with the ‘Initial RTO’ set to 100ms, Windows has a MinRTO value of 300ms:

minrto

After the initial ‘attempt’ there is a 10-12 second delay.

Setting the MinRTO to the minimum 20ms

minrto_20

Reduces our logon time further now because our SYN packets are now about 200ms apart:

minrto_200ms

We are now 16 seconds, 13 seconds spent on the profile upon which 12 seconds was timing out this connection.

16s-logon

 

Would you implement this?  I would strongly recommend against it.  SYN’s were designed so blips in the network can be overcome.  Unfortunately, I know of no way to get around the ‘Home Directory responds to DNS but not to ping’ timeout.  The following group policies have no effect:

gpo1

gpo2

 

And I suspect the reason it has no effect is because the server still responds to DNS so the SYN sequence takes place and blocks regardless of this GPO settings.  Undoubtedly, it’s because this comes into play:
gpo_explain

Since our user has a home directory the preference is to ‘always wait for the network to be initialized before logging the user on’.  There is no override.

Is there a way to detect a dead home directory server during a logon?  Outside of long logon’s I don’t see any event logging or anything that would point us to determine if this is the issue without having to do a packet capture in a isolated environment.

 

 

Read More

ControlUp – Dissecting Logon Times a step further (invalid Home Directory impact)

2016-09-07
/ /
in Blog
/

Continuing on from my previous post, we were still having certain users with logons in the dozens of seconds to minutes.  I wanted to find out why and see if there is anything further that could be done.

60second_profile

 

After identifying a user with a long logon with ControlUp I ran the ‘Analyze Logon Duration’ script:

51-1second_profile

 

Jeez, 59.4 seconds to logon with 51.2 seconds of that spent on the User Profile portion.  What is going on?  I turned to process monitor to capture the logon process:

screen-shot-2016-09-07-at-8-24-18-pm

Well, there appears to be a 1 minute gap between the cmd.exe command from when WinLogon.exe starts it.  The stage it ‘freezes’ at is “Please wait for the user profile service”.

 

Since there is no data recorded by Process Monitor I tested by deleting the users profile.  It made no difference, still 60 seconds.  But, since I now know it’s not the user profile it must be something else.  Experience has taught me to blame the user object and look for network paths.  50 seconds or so just *feels* like a network timeout.  So I examined the users AD object:

screen-shot-2016-09-07-at-8-46-19-pm

 

Well well well, we have a path.  Is it valid?

screen-shot-2016-09-07-at-8-49-42-pm

 

 

It is not valid.  So is my suspicion correct?  I removed the Home Directory path and relaunched:

without_homedir_logon_time

Well that’s much, much better!

So now I want ControlUp to identify this potential issue.  Unfortunately, I don’t really see any events I can key in on that says ‘Attempting to map home drive’.  But what we can do is pull that AD attribute and test to see if it’s valid and let us know if it’s not.  This is the output I now generate:

new_script

 

I revised the messaging slightly as I’ve found the ‘Group Policy’ phase can be affected if GPP Drive Maps reference the home directory attribute as well.

 

So I took my previous script and updated it further.  This time with a check for valid home directories.  I also added some window sizing information to give greater width for the response as ‘Interim Delay’ was getting truncated when there were long printer names.  Here is the further updated script:

Read More

ControlUp – Dissecting Logon Times a step further (Printer loading)

2016-08-31
/ /
in Blog
/

We have applications that require printers be loaded before the application is started.  This is usually because the application will check for a specific printer or default printer and if one is not set (because it hasn’t mapped into the session) then it’ll throw up a dialog or not start entirely.

So we have this value ‘unchecked’ for some applications:

Screen Shot 2016-08-31 at 12.12.08 AM

But how does this impact our logon times?

Well… Our organization just underwent a print server migration/upgrade where some print servers were decommissioned and don’t exist.  But some users still have them or references to them on their end points.  We do have policies that only map your default printer, but some users are on a policy to map ‘all’ printers they have on their system.

What’s the impact?

Screen Shot 2016-08-31 at 12.15.10 AM

Waiting for printers before starting the application…

 

Screen Shot 2016-08-31 at 12.26.50 AM

Without waiting for printers

16 Seconds?  How is that so?

Well, it turns out waiting for printers and the subsystem components to support them add a fair amount of time, and then worse is network printers that don’t go anywhere anymore.  I’ve seen these logons wait for connection before timing out, all the while the user sits there and waits.  The script that comes with ControlUp for analyzing logons is good, but I wanted to know more on why some systems had long logon times and the only clue was Pre-Shell (userinit) taking up all the time.  So I dug into the print logs and found a way to measure their impact.

Screen Shot 2016-08-31 at 12.32.05 AM

With my modified script we can clearly see waiting for the printers takes ~15.4s with a few printers over a few seconds and the rest at 0.5 seconds or so.  One thing about this process is that mapping printers is synchronous.  So when or if 1 stalls, the whole process gets stuck.  All my printers were local except for the ‘Generic / Text Only’ which was a network printer where I powered off the server.  It hung the longest at 5.9 seconds, but I’ve seen ‘non-existant’ network mapped printers hang for 150 seconds or so…

To facilitate finding the printers we need to pass the clientName to the server and the Print Service Logs need to be enabled.

You can enable the print service logs on server 2008R2 by executing the following:

The ControlUp arguments need to look like this now:

Screen Shot 2016-08-31 at 12.40.36 AM

Here is my updated script:

I hope to dig into other startup components and further drill down into what our user launch process looks like.  We wait, and we see 🙂

Read More