Blog

Tracing Citrix Provisioning Service (PVS) Target Device Boot Performance – Process Monitor

2017-01-31
/ /
in Blog
/

Non-Persistent Citrix PVS Target Devices have more complicated boot processes then a standard VM.  This is because the Citrix PVS server components play a big role in acting as the boot disk.  They send UDP packets over the network to the target device.  This adds a delay that you simply cannot avoid (albeit, possibly a small one but there is no denying network communication should be slower than a local hard disk/SSD).

One of the things we can do is set the PVS target devices up in such a way that we can get real, measurable data on what the target device is doing while it’s booting.  This will give us visibility into what we may actually require for our target devices.

There are two programs that I use to measure boot performance.  Windows Performance Toolkit and Process Monitor.  I would not recommend running both at the same time because the logging does add some overhead (especially procmon in my humble experience).

The next bit of this post will detail how to offline inject the necessary software and tools into your target device image to begin capturing boot performance data.

Process Monitor

For Process Monitor you must extract the boot driver and inject the process monitor executable itself into the image.

To extract the boot driver simple launch process monitor, under the Options menu, select ‘Enable Boot Logging’

Then browse to your C:\Windows\System32\Drivers folder, and with “Show Hidden Files” enabled, copy out Procmon23.sys

It might be a good idea to disable boot logging if you did it on your personal system now 🙂

 

Now we need to inject the follow registry entry into our image:

Here are the steps in action:

Seal/promote the image.

On next boot you will have captured boot information:

To see how to use Windows Performance Toolkit for boot tracing Citrix PVS Target Device’s click here.

Read More

Tracing Citrix Provisioning Service (PVS) Target Device Boot Performance – Windows Performance Toolkit

2017-01-31
/ /
in Blog
/

Non-Persistent Citrix PVS Target Devices have more complicated boot processes then a standard VM.  This is because the Citrix PVS server components play a big role in acting as the boot disk.  They send UDP packets over the network to the target device.  This adds a delay that you simply cannot avoid (albeit, possibly a small one but there is no denying network communication should be slower than a local hard disk/SSD).

One of the things we can do is set the PVS target devices up in such a way that we can get real, measurable data on what the target device is doing while it’s booting.  This will give us visibility into what we may actually require for our target devices.

There are two programs that I use to measure boot performance.  Windows Performance Toolkit and Process Monitor.  I would not recommend running both at the same time because the logging does add some overhead (especially procmon in my humble experience).

The next bit of this post will detail how to offline inject the necessary software and tools into your target device image to begin capturing boot performance data.

Windows Performance Toolkit

For the Windows Performance Toolkit it must be installed on the image or you can copy the files from an install to your image in the following path:

To offline inject, simply mount your vDisk image and copy the files there:

 

Then the portion of it that we are interested in is “xbootmgr.exe” (aka boot logging).  In order to enable boot logging we need to inject the following registry key into our PVS Image:

Seal/promote the image.

On next boot you will have captured boot information:

To see how to use Process Monitor for boot tracing Citrix PVS Target Device’s click here.

Read More

ControlUp – Script Based Action (SBA) Action Auditing

2017-01-30
/ /
in Blog
/

ControlUp is a tool we use to monitor our Citrix environment.  We have multiple people and multiple times actions are run via ControlUp and an easier way to review the actions would be nice.  ControlUp keeps all machine actions executed by them on the local machine’s event log.  To review these logs I decided, what better way than to use ControlUp!

The Script Based Action (SBA):

And the steps to create the SBA:

  1. Create a new SBA and name it “ControlUp Action Auditing” and click ‘Next’
  2. Set the ‘Assigned to:’ “Computer” and ‘Execution Context:’ as “ControlUp Console”
  3. Add the script and set the ‘Execution Timeout (seconds)’ to whatever will satisfy querying your remote systems (I set mine to 120).
  4. Setup the variables
    $args[0] = ‘Name’ property
    $args[1]: 
    (note the “pipe” symbol in the ‘Input Validation string’
    $args[2]: 
  5. Save and Finalize the SBA.  And now it in action:
Read More

Lets Make PVS Target Device Booting Great Again (Part 2)

2017-01-05
/ /
in Blog
/

Continuing on from Part 1, we are looking to optimize the PVS boot process to be as fast as it possibly can be.  In Part 1 we implemented Jumbo Frames across both the PVS target device and the PVS server and discovered that Jumbo Frames only applies to the portion where BNIStack kicks in.

In this part we are going to examine the option “I/O burst size (KB)”  This policy is explained in the help file:

I/O burst size — The number of bytes that will be transmitted in a single read/write transaction before an ACK is sent from the server or device. The larger the IO burst, the faster the throughput to an individual device, but the more stress placed on the server and network infrastructure. Also, larger IO Bursts increase the likelihood of lost packets and costly retries. Smaller IO bursts reduce single client network throughput, but also reduce server load. Smaller IO bursts also reduce the likelihood of retries. IO Burst Size / MTU size must be <= 32, i.e. only 32 packets can be in a single IO burst before a ACK is needed.

What are these ACK’s and can we see them?  We can.  They are UDP packets sent back from the target device to the PVS server.  If you open Procmon on the PVS server and startup a target device an ACK looks like so:

These highlighted 48byte UDP Receive packets? They are the ACKS

And if we enable the disk view with the network view:

 

With each 32KB read of the hard disk we send out 24 packets, 23 at 1464 bytes and 1 at 440 bytes.  Add them all together and we get 34,112 Bytes of data.  This implies an overall overhead of 1344 bytes per sequence of reads or 56 bytes per packet.  I confirmed it’s a per-packet overhead by looking at a different read event at a different size:

If we look at the first read event (8,192) we can see there is 6 packets, 5 at 1464 and one at 1208, totaling 8528 bytes of traffic.  8528 – 8192 = 336 bytes of overhead / 6 packets = 56 bytes.

The same happens with the 16,384 byte read next in the list.  12 packets, 11 at 1464 and one at 952, totaling 17,056.  17056 – 16384 = 672 bytes of overhead / 12 packets = 56 bytes.

So it’s consistent.  For every packet at the standard 1506 MTU you are losing 3.8% to overhead.  But there is secretly more overhead than just that.  For every read there is a 48 byte ACK overhead on top.  Admittedly, it’s not much; but it’s present.

And how does this look with Jumbo Frames?

For a 32KB read we satisfied the request in 4 packets.  3 x 8972 bytes and 1 at 6076 bytes totalling 32,992 bytes of transmitted data.  Subtracting the transmitted data from what is really required 32,992-32,768 = 224 bytes of overhead or…  56 bytes per packet 🙂

This amounts to a measly 0.6% of overhead when using jumbo frames (an immediate 3% gain!).

But what about this 32KB value.  What happens if we adjust it longer (or shorter)?

Well, there is a limitation that handicaps us…  even if we use Jumbo Frames.  It is stated here:

IO Burst Size / MTU size must be <= 32, i.e. only 32 packets can be in a single IO burst before a ACK is needed

Because Jumbo Frames don’t occur until after the BNIStack kicks in, we are limited to working out this math at the 1506 MTU size.

The caveat of this is the size isn’t actually the MTU size of 1506.  The math is based on the data that fits within, which is 1464 bytes.  Doing the math in reverse gives us 1464 x 32 = 45056 bytes.  This equals a clear 44K (45056 /1024) maximum size.  Setting IO/Burst to 44K and the target device still boots.  Counting the packets, there are 32 packets.

So if we up the IO/Burst by 1K to 45K (45*1024 = 46,080 bytes) will it still boot?

It does not boot.  This enforces a hard limit of 44K for I/O Burst until the 1st stage supports a larger MTU size.  I have only explored EFI booting, so I suppose it’s possible another boot method allows for larger MTU?

The reads themselves are split now, hitting the ‘version’ and the ‘base’ with the base being 25,600 + 20,480 for the version (46,080 bytes).  I believe this is normal for versioning though.

So what’s the recommendation here?

Good question.  Citrix defaults to 32K I/O Burst Size.  If we break the operation of the burst size we have 4 portions:

  1. Hard drive read time
  2. Packet send time
  3. Acknowledgement of receipt
  4. Turnaround time from receipt to next packet send

The times that I have for each portion at a 32K size appear to be (in milliseconds):

  1. 0.3
  2. 0.5
  3. 0.2
  4. 0.4

A total time of ~1.4ms per read transaction at 32K.

For 44K I have the following:

  1. 0.1
  2. 0.4
  3. 0.1
  4. 0.4

For a total time of ~1.0ms per read transaction at 44K.

I suspect the 0.4ms difference could be well within a margin of error of my hand based counting.  I took my numbers from a random sampling of 3 transactions, and averaged.  I cannot guarantee they were at the same spot of the boot process.

However, it appears the difference between them is close to negligible.  Question that must be posed is what’s the cost of a ‘retry’ or a missed or faulty UDP packet?  From the evidence I have it should be fairly small, but I haven’t figured out a way to test or detect what the turnaround time of a ‘retry’ is yet.

Citrix has a utility that gives you some information on what kind of gain you might get.  It’s called ‘Stream Console’ and it’s available in the Provisioning Services folder:

 

With 4K I/O burst it does not display any packets sent larger because they are limited to that size

 

8K I/O Burst Size. Notice how many 8K sectors are read over 4K?

 

16K I/O Burst Size

 

What I did to compare the differences in performance between all the I/O Burst Size options is I simply tried each size 3 times and took the results as posted by the StatusTray utility for boot time.  The unfortunate thing about the Status Tray is that it’s time/throughput calculations are rounded to the second.  This means that the Throughput isn’t entirely accurate as a second is a LARGE value when your talking about the difference between 8 to 9 seconds.  If you are just under or over whatever the rounding threshold is it’ll change your results when we start getting to these numbers.  But I’ll present my results anyways:

To me, the higher value of I/O Burst Size the better the performance.  

Again, caveats are that I do not know what the impact of a retry is, but if reading from the disk and resending the packet takes ~1ms then I imagine the ‘cost’ of a retry is very low, even with the larger sizes.  However, if your environment has longer disk reads, high latency, and a poor network with dropped or lost packets then it’s possible, I suppose, that higher I/O burst is not for you.

But I hope most PVS environments are something better designed and you actually don’t have to worry about it.  🙂

Read More

Lets Make PVS Target Device Booting Great Again (Part 1)

2016-12-30
/ /
in Blog
/

Some discussions have swirled recently about implementing VDI.  One of the challenges with VDI are things like slow boot times necessitating having machines pre-powered on, requiring a pool of machines sitting using server resources until a logon request comes in and more machines are powered on to meet the demand…  But what if your boot time is measured in the seconds?  Something so low you could keep the ‘pool’ of machines sitting on standby to 1 or 2 or even none!

I’m interested in investigating if this is possible.   I previously looked at this as a curiosity and achieved some good results:

 

However, that was a non-domain Server 2012 R2 fresh out of the box.  I tweaked my infrastructure a bit by storing the vDisk on a RAM Disk with Jumbo Frames (9k) to supercharge it somewhat.

Today, I’m going to investigate this again with PVS 7.12, UEFI, Windows 10, on a domain.  I’ll show how I investigated booting performance and see what we can do to improve it.

The first thing I’m going to do is install Windows 10, join it to the domain and create a vDisk.

Done.  Because I don’t have SCVMM setup on my home lab I had to muck my way to enabling UEFI HDD boot.  I went into the PVS folder (C:\ProgramData\Citrix\Provisioning Services) and copied out the BDMTemplate_uefi.vhd to my Hyper-V target Device folder

I then edited my Hyper-V Target Device (Gen2) and added the VHD:

I then mounted the VHD and modified the PVSBOOT.INI file so it pointed to my PVS server:

 

 

I then created my target device in the PVS console:

 

And Viola!  It Booted.

 

And out of the gate we are getting 8 second boot times.  At this point I don’t have it set with a RAM drive or anything so this is pretty stock, albeit on really fast hardware.  My throughput is crushing my previous speed record, so if I can reduce the amount of bytes read (it’s literally bytes read/time = throughput) I can improve the speed of my boot time.  On the flip side, I can try to increase my throughput but that’s a bit harder.

However, there are some tricks I can try.

I have Jumbo Frames enabled across my network.  At this stage I do not have them set but we can enable them to see if it helps.

To verify their operation I’m going to trace the boot operation from the PVS server using procmon:

We can clearly see the UDP packet size is capping out at 1464 bytes, making it 1464+ 8 byte UDP header + 20 byte IP header = 1492 bytes.  I enabled Jumbo Frames

Under Server Properties in the PVS console I adjusted the MTU to match the NIC:

 

You then need to restart the PVS services for it take effect.

I then made a new vDisk version and enabled Jumbo Frames in the OS of the target device.  I did a quick ping test to validate that Jumbo Frames are passing correctly.

I then did started procmon on the PVS server, set the target device to boot…

and…

 

1464 sized UDP packets.  A little smaller than the 9000 bytes or so it’s supposed to be.  Scrolling down a little futher, however, shows:

 

Notice the amount of UDP packets sent in the smaller frame size?

 

Approximately 24 packets until it gets a “Receive” notification to send the next batch of packets.  These 24 packets account for ~34,112 bytes of data per sequence.  Total time for each batch of packets is 4-6ms.

If we follow through to when the jumbo frames kick in we see the following:

This is a bit harder to read because the MIO (Multiple Input Output) kicks in here and so there are actually two threads executing the read operations as opposed to the single thread above.

Regardless, I think I’ve hit on a portion that is executing more-or-less sequentially.  The total amount of data being passed in these sequences is ~32,992 bytes but the time to execute on them is 1-2ms!  We have essentially doubled the performance of our latency on our hard disk.

So why is the data being sent like this?  Again, procmon brings some visibility here:

Each “UDP Receieve” packet is a validation that the data it received was good and instructs the Sream Process to read and send the next portion of the file on the disk.  If we move to the jumbo frame portion of the boot process we can see IO goes all over the place in size and where the reads are to occur:

So, again, jumbo frames are a big help here as all requests under 8K can be serviced in 1 packet, and there are usually MORE requests under 8K then above.  Fortunately, Procmon can give us some numbers to illustrate this.  I started and stopped the procmon trace for each run of a Network Boot with Jumbo Frames and without:

Standard MTU (1506)

 

Jumbo Frame MTU (9014)

 

The number we are really after is the 192.168.1.88:6905.  The total number of events are solidly in half with the number of sends about a 1/3 less!  It was fast enough that it was able to process double the amount of data in Bytes sent to the target device and bytes received from the target device!

Does this help our throughput?  Yes, it does:

 

“But Trentent!  That doesn’t show the massive gains you are spewing!  It’s only 4MB/s more in Through-put!”

And you are correct.  So why aren’t we seeing more gains?  The issue lies with how PVS boots.  It boots in two stages.  If you are familiar with PVS on Hyper-V from a year ago or more you are probably more aware of this issue.  Essentially, PVS breaks the boot into the first stage (bootloader stage) which starts in, essentially, a lower-performance mode (standard MTU).  Once the BNIStack loads it kicks into Jumbo Packet mode with the loading of the Synthetic NIC driver.  The benefits from Jumbo Frames doesn’t occur until this stage.  So when does Jumbo Frames kick in?  You can see it in Event Viewer.

From everything I see with Procmon, first stage boot ends on that first Ntfs event.  So out of the original 8 seconds, 4 is spent on first stage boot where Jumbo Packets are not enabled.  Everything after there is impacted (positively).  So for our 4 seconds “standard MTU” boot, bringing that down by a second is a 25% improvement!  Not small potatoes.

I intend to do more investigation into what I can do to improve boot performance for PVS target devices so stay tuned!  🙂

Read More

AppV5 – 0xFD01F25-0x2 and deciphering some of these messages

2016-12-13
/ /
in Blog
/

We recently had an issue with some users launching an AppV 5 application.  They were getting an error message:

In order to troubleshoot this issue more effectively, I turned on ‘Event Tracing for Windows‘ for the AppV logs and captured the output.  Searching for the error code revealed the following:

 

The first line that shows an ‘error’:

I navigated to that path, and sure enough, a ‘Templates’ folder was not present.

I did a procmon trace during that user logon and noticed the folder was never created.  AppV, for some reason, when it did not find this folder threw up an error.  If I create that folder I was able to launch the application without issue.

So what does the error message “0xFD01F25-0x2” mean?  Well, the first portion split by the ‘dash’ is the component that is explained to decipher where in AppV this issue is occuring.  The second string (0x2) is more interesting because it actually tells us something.  Microsoft has these short codes documented here.

0x2 = the system cannot find the file specified.  It’s actually looking for a folder, but the object didn’t exist and that’s the code it generated.  So if you see that second octect in an AppV error, the short system error code may give you a more precise clue to what is occuring and how you can fix it.

Read More

Citrix Netscaler 11.1 Unified Gateway and a non-working Citrix HTML5 Receiver

2016-12-06
/ /
in Blog
/

We setup a Citrix Unified Gateway for a proof of concept and were having an issue getting the HTML5 Receiver to connect for external connections.  It was presenting an error message: “Citrix Receiver cannot connect to the server”.  We followed this documentation.  It states, for our use case:

What would probably help is having a proxy that can parse out all websocket traffic and convert to ICA/CGP traffic without any need of changes to XA/XD. Netscaler Gateway does exactly this… …NetScaler Gateway also doesn’t need any special configuration for listening over websockets…

Connections via NetScaler Gateway:

…When a gateway is involved, the connections are similar in all Receivers. Here, Gateway acts as WebSocket proxy and in-turn opens ICA/CGP/SSL native socket connections to backend XenApp and XenDesktop. …

…So using a NetScaler Gateway would help here for ease of deployment. Connection to gateway is SSL/TLS and gateway to XenApp/XenDesktop is ICA/CGP….

And additional documentation here.

WebSocket connections are also disabled by default on NetScaler Gateway. For remote users accessing their desktops and applications through NetScaler Gateway, you must create an HTTP profile with WebSocket connections enabled and either bind this to the NetScaler Gateway virtual server or apply the profile globally. For more information about creating HTTP profiles, see HTTP Configurations.

Ok.  So we did the following:

  1. We enabled WebSocket connections on Netscaler via the HTTP Profiles
  2. We configured Storefront with HTML5 Receiver and configured it for talking to the Netscaler.

And then we tried launching our application:

We started our investigation.  The first thing we did was test to see if HTML5 Receiver works at all.  We configured and enabled websockets on our XenApp servers and then logged into the Storefront server directly, and internally.  We were able to launch applications without issue.

The second thing we did was enable logging for HTML5 receiver:

To view Citrix Receiver for HTML5 logs

To assist with troubleshooting issues, you can view Citrix Receiver for HTML5 logs generated during a session.

  1. Log on to the Citrix Receiver for Web site.
  2. In another browser tab or window, navigate to siteurl/Clients/HTML5Client/src/ViewLog.html, where siteurlis the URL of the Citrix Receiver for Web site, typically http://server.domain/Citrix/StoreWeb.
  3. On the logging page, click Start Logging.
  4. On the Citrix Receiver for Web site, access a desktop or application using Citrix Receiver for HTML5.

    The log file generated for the Citrix Receiver for HTML5 session is shown on the logging page. You can also download the log file for further analysis.

This was the log file it generated:

The “Close with code=1006” seemed to imply it was a “websocket” issue from google searches.

 

The last few events prior to the error are “websocket” doing…  something.

I proceeded to spin up a home lab with XenApp and a Netscaler configured for HTML5 Receiver and tried connecting.  It worked flawlessly via the Netscaler.  I enabled logging and took another look:

So there is a lot of differences but we focus on the point of failure in our enterprise netscaler we see it seems to retry or try different indexes (3 in total, 0, 1 and 2).

So there is a lot of evidence that websockets seem to be culprit.  We have tried removing Netscaler from the connection picture by connecting directly to Storefront and HTML5 receiver works.  We have configured both Netscaler and Storefront (with what we think) is a correct configuration.  And still we are getting a failure.

I opened up a call to Citrix.

It was a fairly frustrating experience.  I had tech’s ask me to go to “Program Files\Citrix\Reciever” and get the receiver version (hint, hint, this does not exist with HTML5).  I captured packets of the failure “in motion” and they told me, “it’s not connecting to your XenApp server”.  — Yup.  That’s the Problem.

It seems that HTML5 is either so new (it’s not now), so simple (it’s not really), or tech’s are just poorly trained.  I reiterated to them “why does it make 3 websocket connections on the ‘bad’ netscaler? Why does the ‘good’ netscaler appear to connect the first time without issue?”  I felt the tech’s ignore and beat around the bush regarding websockets and more focus put on the “Storefront console”.  Storefront itself was NOT logging ANYTHING to the event logs.  Apparently this is weird for a storefront failure.  I suspected Storefront was operating correctly and I was getting frustrated we weren’t focusing on what I suspected was the problem (websockets).  So I put the case on hold so I could focus on doing the troubleshooting myself instead of going around in circles on setting HTML5 to “always use” or “use only when native reciever is not detected”.

Reviewing the documentation for the umpteenth time this “troubleshooting connections” tidbit came out:

Troubleshooting Connections:

In cases where you are not able to connect some of the following points might help in finding out the problem. They also can be used while opening support case or seeking help in forums:

1) Logging: Basic connection related logs are logged by Receiver for HTML5 and Receiver for Chrome.

2) Browser console logs: Browsers would show errors related to certificates or network related failures here.

  • Receiver for HTML5: Open developer tools for HDX session browser tab. Tip: In Windows, use F12 shortcut in address bar of session page.

  • Receiver for Chrome: Go to chrome://inspect, click on Apps on left side. Click on inspect for Citrix Receiver pages (Main.html, SessionWindow.html etc)

The browser may show a log?  I wish I would have thought of that earlier.  And I wish Citrix would have put that in the actual “Receiver for HTML5” documentation as opposed to buried in a blog article.

So I opened the Console in Chrome, launched my application and reviewed the results:

We finally have some human readable information.

Websocket connections are failing during the handshake “Unexpected response code: 302”

What heck does 302 mean?  I installed Fiddler and did another launch withe Fiddler tracing:

 

I highlighted the area where it tells us it’s attempting to connect with websockets.  We can see in the response packet we are getting redirected, that’s what ‘302’ means.  I then found a website that lets you test your server to ensure websockets are working.  I tried it on our ‘bad’ netscaler:

 

Hitting ‘Connect’ left nothing in the log.  However, when I tried it with my ‘good’ netscaler…

 

It works!  So we can test websockets without having to launch and close the application over and over…

 

So we started to investigate the Netscaler.  We found numerous policies that did URL or content redirection that would be taking place with the packet formulated like so.  We then compared our Netscaler to the one in my homelab and did find one, subtle difference:

The one on the left is showing a rule for HTTP.REQ.URL.CONTAINS_ANY(“aaa_path”) where the one on the right just shows “is_vpn_url”.  Investigating further it was stated that our team was trying to get AAA authentication working and this was an option set during a troubleshooting stage.  Apparently, it was forgotten or overlooked when the issue was resolved (it was not applicable and can be removed).  So we set it back to having the “is_vpn_url” and retried…

It worked!  I tried the ‘websockets.org’ test and it connected now!  Looking in the Chrome console showed:

 

Success!  It doesn’t pause on the websocket connection and the console logging shows some interesting information.  Fiddler, with the working connection, now displayed the following:

Look!  A handshark response!

 

So, to review what we learned:

  1. Connections via Netscaler to HTML5 reciever do NOT require  (but is possible) a SSL connection on each target XenApp device
  2. Connection via Netscaler work over standard port (2598/1494) and do not require any special configuration on your XenApp server.
  3. You can use ‘http://www.websocket.org/echo.html’ to test your Netscaler to ensure websockets are open and working.
  4. Fiddler can tell you verbose information on your websocket connection and their contents.
  5. The web browser’s Javascript console is perfect to look at verbose messages in HTML5.

 

And with that, we are working and happy, Good Night!

Read More

AppV 5.1 Sequencer – Not capturing all registry keys – Update

2016-10-31
/ /
in Blog
/

My previous post touched on an issue we are having with some applications.  The AppV sequencer is not capturing all registry keys.  I have been working with Microsoft on this issue for over 2 years but just recently got some headway with getting this addressed.  And I have good news and bad news.  The good news is the root cause for this issue appears to have been found.

It appears that ETW (Event Tracing for Windows) will capture some of the events out of order and the AppV sequencer will then apply that out of order sequence.  The correct sequence of events should be:
RegDeleteKey
RegCreateKey

But in certain scenarios it’s capturing the events as:

RegCreateKey
RegDeleteKey

By capturing the deletion last in the order, the AppV sequencer is effectively being told to ‘Not’ capture the registry key.

Microsoft has corrected this issue in the ‘Anniversary’ edition of Windows 10 (Build 14393+) and sequencing in this OS will capture all the registry keys correctly.

The bad news is Microsoft is evaluating backporting the fix to older versions of Windows.  Specifically Windows 2008 R2.  Windows 2008R2 is still widely used and AppV best practice is to sequence on the OS you plan on deploying but if the older OS sequences unreliably this complicates the ability to ‘trust’ the product.  This fix still needs to be backported to 2012, 2012R2 and the related client OS’s, so hopefully they get it as well.  The reason I was told 2008 R2 may not get the fix is that it is no longer in Mainstream support, but Windows 7SP1 currently is, which is analogous to 2008 R2.  So hopefully the effort to fix this bug will be backported and we’ll have a solid AppV platform where we can sequence our packages in confidence.

Read More

Windows Update – 0x80070308 ERROR_REQUEST_OUT_OF_SEQUENCE

2016-10-31
/ /
in Blog
/

We are getting this error when trying to install KB3172605, which was re-released in September of this year.  This patch was originally installed in August without issue but the re-release fails to install.  The WindowsUpdate.log reports:

Examining the CBS.LOG for more information has revealed the following:

The error starts here:

One of the nice things about Windows Update is it kicks up Windows Error Reporting immediately when it detects an error.  So if you run procmon.exe you can find when the error occurs and work backwards only a little bit to, usually, find why it crashed.

In this case I ran procmon while I tried installing this update, went to 11:25:10 and did a search for “Windows Error Reporting”

windowsupdatebug1

The actual Windows Error Reporting kicked in at the check to see if it is disabled (DisableWerReporting).  The last key it read before it crashed was:

“HKLM\COMPONENTS\DerivedData\Components\amd64_0e471cf709070f76ea5797942bb36096_31bf3856ad364e35_6.1.7601.23455_none_9b9ebc8fa6659c8e”

If I open that key in Regedit here is what it looks like:

windowsupdatebug2

And what another key in that same list looks like:

windowsupdatebug3

 

It appears the key that is in error is missing some information.  In order to fix this I’m going to another system and try to grab they values needed for identity and S256H.  I don’t know how these keys are generated but I’m hoping they are generated by the file it’s referring to which *should* mean that a same or similar level patched system should have the same files and thus the same generated hash’s.

I went to another system and exported the values.  In order to find the proper key, we find it is referenced by the previous key in the procmon trace:
“amd64_microsoft-windows-b..environment-windows_31bf3856ad364e35_6.1.7601.23455_none_c7bdc8a2bca7…”

The key path was present in both systems because they were at the same patch level but the look of the key is different because of GUIDs are generated:

windowsupdatebug10

Working system – Notice the highlight is different

By tracing back in the working system I exported the key that “c!” is referencing.

Working System has S256H, identity keys

Working System has S256H, identity keys

I exported out the key from the working system and had to edit it so the GUID matched the broken key’s values.  I took the ‘working’ registry:

And the broken key:

I added the S256H and identity values to the ‘broken’ key:

And then added the last “c!” line by following these next steps:

  1. copied the line out:
    “c!c4ebacc5355..93c10e6175a_31bf3856ad364e35_6.1.7601.23455_8459b72d1e2f2700″=hex:
  2. Pasted the value from the registry “key” underneath with the “broken” value:
  3. Starting at the ellipse, counted the number of characters from the “55..” to “93′ underneath and created the value:
  4. Lastly, removed the ‘_none” from the string:
  5. I could then add “c!” in front and added it to my ‘broken’ registry file:
    I then tried to install the patch:

    windowsupdatebug9

And it worked!

Read More

AppV5 – Citrix User Profile Manager exclusions

2016-10-21
/ /
in Blog
/

The Citrix User Profile Manager (UPM) needs a little configuration tweaking to work with AppV specifically it requires:

 You must exclude the following item using Profile management exclusions:
 Profile Management\File system\Exclusion list\directories:
  • AppData\Local\Microsoft\AppV
 If you don’t exclude these items, App-V applications work the first time users access them but they fail, with an error, on subsequent logons.

But what happens when you *don’t* exclude this directory?

We upgraded our Citrix UPM to 5.4.1 and in that process we moved from setting our inclusions/exclusions via the ini file to using Group Policy.  The original thought was simply adding the exclusions would add them to the existing list of default inclusions/exclusions which already has this directory set.  This line of thinking was incorrect.  Citrix’s documentation states:

Important: If you use Group Policy rather than the .ini file (or you are rolling out a Group Policy deployment after a successful test with the .ini file), note that, unlike the installed .ini file, no items are included or excluded by default in the .adm or .admx file. This means you must add the default items manually to the file.

When we enabled Group Policy for the exclusions and set the path (for something unrelated to AppV) then it was the ONLY item being excluded from AppV and we were having the issue described by Citrix.  Our application would launch the first time, or oddly, just for the user on that specific server.  When they launched it again on another server it would fail until their user profile was deleted from the profile share.

I setup AppV5 debug logging and traced a launch of what this failure looked like when our user tried to start an AppV application:

The lesson?

If you are using a Profile Manager ensure your exclusions for AppV are applied correctly!  If you miss you may run into this weird behaviour.

Read More