Registry

User Profile Manager – Unavoidable Delays

2018-06-30
/ /
in Blog
/

I’ve been exploring optimizing logon times and noticed “User Profile Service” always showed up for 1-3 seconds.  I asked why and began my investigation.

The first thing I needed to do is separate the “User Profile Service” into it’s own process.  It’s originally configured to share the same process as other services which makes procmon’ing difficult.

Making this change is easy:

Now that the User Profile Service is running in it’s own process we can use Process Monitor to target that PID.

I logged onto the RDS server with a different account and started my procmon trace.  I then logged into server:

One of the beautiful things about a video like this is we can start to go through frame-by-frame if needed to observe the exact events that are occurring.  Process Monitor also gives us a good overview of what’s happening with the “Process Activity” view:

9,445 file events, 299,668 registry events.  Registry, by far, has the most events occurring on it.  And so we investigate:

  1. On new logins the registry hive is copied from Default User to your profile directory, the hive is mounted and than security permissions are set.

    Setting the initial permissions of the user hive began at 2:14:46.3208182 and finished at 2:14:46.4414112.  Spanning a total time of 121 milliseconds.  Pretty quick but to minimize logon duration it’s worth examining each key in the Default User hive and ensuring you do not have any unnecessary keys.  Each of these keys will have their permissions evaluated and modified.
  2. The Profile Notification system now kicks off.

    The user profile server now goes through each “ProfileNotification” and, if it’s applicable, executes whatever action the module is responsible for.  In my screenshot we can see the User Profile Service alerts the “WSE”.  Each key actually contains the friendly name, giving you a hint about its role:

    It also appears we can measure the duration of each module by the “RegOpenKey” and “RegCloseKey” events tied to that module.

    In my procmon log, the WSE took 512ms, the next module “WinBio” took 1ms, etc.  The big time munchers for my system were:
    WSE: 512ms
    SyncCenter: 260ms
    SHACCT: 14ms
    SettingProfileHandler: 4ms
    GPSvc: 59ms
    GamesUX: 60ms
    DefaultAssociationsProfileHandler: 4450ms (!)
  3. In the previous screenshot we can see the ProfileNotification has two events kicked off that it runs through it’s list of modules: Create and Load.  Load takes 153ms in total, so Create is what is triggering our event.
  4. DefaultAssociationsProfileHandler consumes the majority of the User Profile Service time.  What the heck is it doing?  It appears the Default Association Profile Handler is responsible for creating the associations between several different components and your ability to customize them.  It associates (that I can see):
    ApplicationToasts (eg, popup notifications)
    RegisteredApplications
    File Extensions
    DefaultPrograms
    UrlAssociations
    The GPO “Set Default Associations via XML file” is processed and the above is re-run with the XML file values.
  5. Do we need these associations?

    Honestly…   Maybe.

    However, does this need to be *blocking* the login process?  Probably not.  This could be an option to be run asynchronously with you, as the admin, gambling that any required associations will be set before the user gets the desktop/app…  Or if you have applications that are entirely single purpose that simply read and write to a database somewhere than this is superfluous.

  6. Can we disable it?

    Yes…

    But I’m on the fence if this is a good idea.  To disable it, I’ve found deleting the “DefaultAssociationsProfileHandler” does work, associations are skipped and we logon 1-4 seconds faster.  However, launching a file directly or shortcut with a url handler will prompt you to choose your default program (as should be expected).

I’m exploring this idea.  Deleting the key entirely and using SetUserFTA to set associations.

We have ~400 App-V applications that write/overwrite approximately 800 different registered applications and file extensions into our registry hive (we publish globally — this puts them there).  This is the reason why I started this investigation is some of our servers with lots of AppV applications were reporting longer UserProfileService times and tying it all together, this one module in the User Profile Service appears to be the culprit.  And with Spectre increasing the duration of registry operations 400% this became noticeable real quick in our testing.

Lastly, time is still being consumed on RDS and server platforms by the infuriating garbage services like GamesUX (“Games Explorer”).  It just tweaks a nerve a little bit when I see time being consumed on wasteful processes.

Read More

Meltdown + Spectre – Performance Analysis

2018-04-30
/ /
in Blog
/

Meltdown and Spectre (variant 2) are two vulnerabilities that came out at the same time, however they are vastly different.  Patches for both were released extremely quickly for Microsoft OS’s but because of a variety of issues with Spectre, only Meltdown was truly available to be mitigated.  Spectre (variant 2) mitigation had a problematic release, causing numerous issues for whomever installed the fix, that it had to be recalled and the release delayed weeks.  However, around March/April 2018, the release of the Spectre patch was finalized and the microcode released.

Threat on Performance

Spectre (variant 2), of the two, threatened to degrade performance fairly drastically.  Initial benchmarks made mention that storage was hit particularly hard.  Microsoft made comments that server OS’s could be hit particularly hard.  Even worse, older operating systems would not support CPU features (PCID) that could reduce the performance impact.  Older OS’s suffer more due to the design (at the time) that involved running more code in kernel mode (fonts was signalled out as an example of one of these design decision) than newer OS’s.

As with most things on my blog I am particularly interested in the impact against Citrix/Remote Desktop Services type of workloads.  I wanted to test ON/OFF workloads of the mitigation impacts.

Setup

My setup consists of two ESXi (version 6.0) hosts with identical VM’s on each, hosting identical applications.  I was able to setup 4 of these pairs of hosts.  Each pair of hosts have identical processors.  The one big notable change is one host of each pair has the Spectre and Meltdown patch applied to the ESXi hypervisor.

The operating system of all the VM’s is Windows Server 2008 R2.  Applications are published from Citrix XenApp 6.5.

 

 

This is simply a snapshot of a single point in time to show the metrics of these systems.

Performance Considerations

Performance within a Citrix XenApp farm can be described in two ways.  Capacity and speed.

Speed

Generally, one would test for a “best case” test of the speed aspect of your applications performance.

A simplified view of this is “how fast can the app do X task?”

This task can be anything.  I’ve seen it measured by an automated script flipping through tabs of an application, where each tab pulled data from a database – rendered it – then moved on to the next tab.  The total time to execute these tasks amounted to a number that they used to baseline the performance of this application.

I’ve seen it measured as simply opening an excel document with macros and lots of formulas that pull data and perform calculations and measuring that duration.

The point of each exercise is to generate a baseline that both the app team and the Citrix team can agree to.  I’ve almost never had the baseline equal “real world” workloads, typically the test is an exaggeration of the actual workflow of users (eg, the test exaggerates CPU utilization).  Sometimes this is communicated and understood, other times not, but hopefully it gives you a starting point.

In general, and for Citrix workloads specifically, running the baseline test on a desktop usually produces a reasonable number of, “well… if we don’t put it on Citrix this is what performance will be like so this is our minimum expectation.”  Sometimes this is helpful.

Once you’ve established the speed baseline you can now look at capacity.

Capacity

After establishing some measurable level of performance within the application(s) you should now be able to test capacity.

If possible, start loading up users or test users running the benchmark.  Eventually, you’ll hit a point where the server fails — either because it ran out of resources, performance degraded so much it errors, etc.  If you do it right, you should be able to start to find the curve that intersects descending performance with your “capacity”.

At this point, cost may come into consideration.

Can you afford ANY performance degradation?

If not, than the curve is fairly easy.  At user X we start to see performance degrade so X-1 is our capacity.

If yes, at what point does performance degrade so much that adding users stops making sense?  Using the “without Citrix this is how it performs on the desktop” can be helpful to establish a minimum level of performance that the Citrix solution cannot cross.

Lastly, if you have network bound applications, and you have an appropriately designed Citrix solution where the app servers sit immediately beside the network resources on super-high bandwidth, ultra-low latency you may never experience performance degradation (lucky you!).  However, you may hit resource constraints in these scenarios.  Eg, although performance of the application is dependent on network, the application itself uses 1GB of RAM per instance of the application — you’ll be limited pretty quickly be the amount of RAM you can have in your VM’s.  These cases are generally preferred because the easy answer to increase capacity is *more hardware* but sometimes you can squeeze some more users with software like AppSense or WEM.

Spectre on Performance

So what is the impact Spectre has on performance — speed and/or capacity?

If Spectre simply makes a task take longer, but you can fit the same number of tasks on a given VM/Host/etc. then the impact is only on speed. Example: a task that took 5 seconds at 5% CPU utilization now takes 10 seconds at 5% CPU utilization.  Ideally, the capacity should be identical even though the task now takes twice as long.

If Spectre makes things use *more* resources, but the speed is the same, then the impact is only on capacity.  Example: a task that took 5 seconds at 5% CPU utilization now takes 10% CPU utilization.  In this scenario, the performance should be identical but your capacity is now halved.

The worst case scenario is if the impact is on both, speed and capacity.  In this case, neither are recoverable except you might be able to make up some speed with newer/faster hardware.

I’ve tested to see the impacts of Spectre in my world.  This world consists of Windows 2008 R2 with XenApp 6.5 on hardware that is 6 years old.  I was also able to procure some newer hardware to measure the impact there as well.

Test Setup

Testing was accomplished by taking 2 identically configured ESXi hosts, applying the VMWare ESXi patch with the microcode for Spectre mitigation to one of the hosts, and enabling it in the operating system.  I added identical Citrix VM’s to both hosts and enabled user logins to start generating load.

 

Performance needs to measured at two levels.  At the Windows/VM level, and at the hypervisor/host level.  This is because the Hypervisor may pickup the additional work required for the mitigation that the operating system may not, and also due to the way Windows 2008 R2 does not accurately measure CPU performance.

Windows/VM Level – Speed

I used ControlUp to measure and capture performance information.  ControlUp is able to capture various metrics including average logon duration.  This singular metric includes various system interactions, from using the network by querying Active Directory, pulling files from network shares, disk to store group policies in a cache, CPU processing which policies are applicable, and executables being launched in a sequence.  I believe that measuring logons is a good proxy for understanding the performance impact.  So lets see some numbers:

 

The top 3 results are Spectre enabled machines, the bottom 3 are without the patch.  The results are not good.  We are seeing a 200% speed impact in this metric.

With ControlUp we can drill down further into the impact:

Without Spectre Patch

 

With Spectre Patch

 

The component that took the largest hit is Group Policy.  Again, ControlUp can drill down into this component.

Without Spectre

 

With Spectre

All group policy preference components take a 200% hit.  The Group Policy Preferences functions operate by pulling down an XML file from the SYSVOL store, reading the XML file, than applying whatever resultant set of policies finds applicable.  In order to trace down further to find more differences, I logged into each type of machine, one with Spectre and one without, and started a Process Monitor trace.  Group Policy is applied via the Group Policy service, which a seperate instance of the svchost.exe.  The process can be found via Task Manager:

Setting ProcMon to filter only on that PID we can begin to evaluate the performance.  I relogged in with procmon capturing the logon.

Spectre Patched system on left, no patch on right

Using ProcessMonitor, we can look at the various “Summaries” to see which particular component may be most affected:


We see that 8.45 seconds is spent on the registry, 0.40 seconds on file actions, 1.04 seconds on the ProcessGroupPolicyExRegistry instruction.

The big ticket item is the time spent with the registry.

So how does it compare to a non-spectre system?

 

We see that 1.97 seconds is spent on the registry, 0.33 seconds on file actions, 0.24 seconds on the ProcessGroupPolicyExRegistry instruction.

Here’s a table showing the results:

So it definitely appears we need to look at the registry actions.  One of the cool things about Procmon is you can set a filter on your trace and open up the summaries and it will show you only the objects in the filter.  I set a filter for RegSetValue to see what the impact is for setting values in the registry:

RegSetValue – without spectre applied

 

RegSetValue – with spectre applied

1,079 RegSetValue events and a 4x performance degradation.  Just to test if it is specific to write events I changed the procmon filter to filter on “category” “Read”

 

Registry Reads – Spectre applied

 

Registry Reads – Spectre not applied

We see roughly the same ratio of performance degradation, perhaps a little more so.  As a further test I created a PowerShell script that will just measure creating 1000 registry values and test it on each system:

Spectre Applied

 

Spectre Not Applied

 

A 2.22x reduction in performance.  But this is writing to the HKCU…  which is a much smaller file.  What happens if I force a change on the much larger HKLM?

Spectre Applied

 

Spectre Not Applied

 

Wow.  The size of the registry hive makes a big difference in performance.  We go from 2.22x to 3.42x performance degradation.  So on a minute level, Spectre appears to have a large impact on Registry operations and the larger the hive the worse the impact.  With this information there is a large element of sense as to why Spectre may impact Citrix/RDS more.  Registry operations occur with a high frequency in this world, and logon’s highlight it even more as group policy and the registry are very intertwined.

This actually brings to mind another metric I can measure.  We have a very large AppV package that has a 80MB registry hive that is applied to the SOFTWARE hive when the package is loaded.  The difference in the amount of time (in seconds) loading this package is:

“583.7499291” (not spectre system)
“2398.4593479” (spectre system)

This goes from 9.7 mins to 39.9 minutes.  Another 4x drop in performance and this would be predominately registry related.  So another bullet that registry operations are hit very hard.

Windows/VM Level – Capacity

Does Spectre affect the capacity of our Citrix servers?

I recorded the CPU utilization of several VM’s that mirror each other on hosts that mirror each other with a singular difference.  One set had the Spectre mitigation enabled.  I then took their CPU utilization results:

Red = VM with Spectre, Blue = VM without Spectre

By just glancing at the data we can see that the Spectre VM’s had higher peaks and they appear higher more consistently.  Since “spiky” data is difficult to read, I smoothed out the data using a moving average:

Red = VM with Spectre, Blue = VM without Spectre

We can get a better feel for the separation in CPU utilization between Spectre enabled/disabled.  We are seeing clearly higher utilization.

Lastly, I took all of the results for each hour and produced a graph in an additive model:

This graph gives a feel for the impact during peak hours, and helps smooth out the data a bit further.  I believe what I’m seeing with each of these graphs is a performance hit measured by the VM at 25%-35%.

Host Level – Capacity

Measuring from the host level can give us a much more accurate picture of actual resources consumed.  Windows 2008 R2 isn’t a very accurate counter and so if there are lots of little slices, they can add up.

My apologies for swapping colors.  Raw data:

Blue = Spectre Applied, Red = No Spectre

Very clearly we can see the hosts with Spectre applied consume more CPU resources, even coming close to consuming 100% of the CPU resources on the hosts.  Smoothing out the data using moving averages reveals the gap in performance with more clarity.

 

Showing an hourly “Max CPU” per hour hit gives another visualization of the performance hit.

 

Summary

Windows 2008 R2, for Citrix/RDS workloads will be impacted quite highly.  The impact that I’ve been able to measure appears to be focused on registry-related activities.  Applications that store their settings/values/preferences in registry hives, whether they be the SOFTWARE/SYSTEM/HKCU hive will feel a performance impact.  Logon actions on RDS servers would be particularly impacted because group policies are largely registry related items, thus logon times will increase as it takes longer to process reads and writes.  CPU utilization is higher on both the Windows VM-level and the hypervisor level,  up to 40%.  The impact of speed on the applications and other functions is notable although more difficult to measure.  I was able to measure a ~400% degradation in performance for CPU processing for Group Policy Preferences, but perception is a real thing, so going from 100ms to 400ms may not be noticed.  However, on applications that measure response time, it was found we had a performance impact of 165%.  What took 1000ms now takes 1650ms.

At the time of this writing, I was only able to quantify the performance impact between two of the different hosts.  The Intel Xeon E5-2660 v4 and Intel Xeon E5-2680.

The Intel Xeon E5-2660 v4 has a frequency of 26% less than the older 2680.  In order to overcome this handicap, the processor must have improved at a per-clock rate higher than the 26% frequency loss.  CPUBenchMark had the two processors with a single thread CPU score of 1616 for the Intel Xeon E5-2660 v4 and 1657 for the Intel Xeon E5-2680.  This put them close but after 4 years the 2680 was marginally faster.  This has played out in our testing that the higher frequency processor is faster.  Performance degradation for the two processors came out as such:

 

CPU Performance Hit
Intel Xeon E5-2680 2.70GHz 155%
Intel Xeon E5-2660 v4 2.00GHz 170%

 

This tells that frequency of the processor is more important to mitigate the performance hit.

Keep in mind, these findings are my own.  It’s what I’ve experienced in my environment with the products and operating systems we use.  Newer operating systems are supposed to perform better, but I don’t have the ability to test that currently so I’m sharing these numbers as this is an absolute worst case type of scenario that you might come across.  Ensure you test the impact to understand how your environment will be affected!

Read More

Corrupt Registry Repair with Citrix Provisioning Services

2018-04-02
/ /
in Blog
/

I encountered an interesting issue and worked through a solution with a corrupt registry.  The issue seemed innocuously enough, we upgraded PowerShell to 5.1.  Upon reboot I encountered a bluescreen with code 0xF4:

I have encountered this issue before, but I haven’t recorded my troubleshooting steps until now.

Since this was a PVS target device, the easy method was deleting the version and trying to upgrade PowerShell to 5.1, which resulted in the same BSOD.  So it was easily reproducible.  So I tried it a few more times, because, why not?

I then deleted this version, and booted into the system and looked at Event Viewer for hints of what could be at fault.  Going through it was pretty obvious that it was registry corruption:

Filtering for EventID 5 shows all the attempts of booting to BSOD:

 

I mean, it literally says the Registry was corrupted 🙂

Where can you find this corruption?  With PVS it’s fairly simple but I believe the same process can exist for other systems including physical.  The first step was to mount the vDisk to a VM.

 

Mount the registry hive you suspect with corruption.

Next is to scan the registry for corruption.  Thus far, I’ve only found corruption to be detectable if it’s a key or value that cannot be read.  If the data on a value can be read but contains garbage it’s much harder to detect.  In order to avoid permissions being a problem, I open a PowerShell prompt as SYSTEM using PSEXEC.  If you don’t elevate permissions, some keys maybe restricted from the Admins group and this will be detected as a failure.

Once at this stage, it’s a one-liner to scan the registry:

In my experience, corruption can be detected as “Permission Denied”, “Access Denied”,”Path does not exist” or some such:

 

At this point you can examine the text file to see the last path it explored:

 

At this point you can open regedit (as SYSTEM) and examine the keys within that path.  Clicking through each one revealed the corrupted key:

Attempting to get Permissions on this key reveals it also exists on the ACL level:

“The requested security information is either unavailable or can’t be displayed”.

Deleting the key may fail as well:

At this point you need to evaluate how to manage the corruption.  If you cannot delete the key, rename it, or in some way replace it you may have an option like I did…  You can rename a higher up branch in the tree, go to a existing system with the same keys (with PVS I can go to the previous version and export that tree) and reimport.

Unload the hive and boot up the system –> and you may have a fully working system!

I’ve used this trick here, and on a corrupt COMPONENTS hive in the past.  With the COMPONENTS hive I got lucky I could replace the corrupted keys with ones from a branched vDisk.  Other machines didn’t have the same key in COMPONENTS so I got lucky.

 

Read More

AppV 5 – Raiser’s Edge 7.96 – Run-time error -2147024770 (800707e)

2017-12-15
/ /
in Blog
/

We are in the process of upgrading Blackbaud’s Raiser’s Edge to 7.96 and we encountered an error:

Run-time error ‘-2147024770 (8007007e)’:
Automation error
The specified module could not be found.

This error is giving us a few clues as to what might be happening.  The most obvious message is the “8007007e” which is a standard windows error hex code which translates to:

8007007E = FileNotFoundException

So RE7.exe is not finding a file it’s looking for.  With most AppV packages we can suss out the file it’s missing by using procmon and tracing for “FILE NOT FOUND” in the result field.  Unfortunately, my searching for this message did NOT result in finding a file that wasn’t resolved by another path.  In other words, all files were accounted for.  But the error message very clearly states that a file is missing.  So the next step is to install the application locally and compare the launch differences between the local install and the AppV install.  Again, process monitor makes this easy by using the “loaded modules” option.

The differences I found between a local install of this application and the AppV launch looked like so:

The launches were identical, until the highlighted points.  The local install, which works without issue, has an extra file that gets loaded.  bbcor7.dll.

It appears, somehow, this file is getting loaded and registered dynamically on a local install, but this is not happening with the AppV install.  I don’t see the file get searched for at all with the AppV install and tracing with procmon.  However, executing a regsvr32 /s “C:\Program Files (x86)\Blackbaud\The Raisers Edge 7\DLL\bbcor7.dll” during sequencing does do all the necessary work to register and allow RE7.exe to find and load the file in an AppV bubble.

So, long story short, execute:

While sequencing your AppV package and this should fix this issue.

Here is my entire sequencing script:

 

Read More

Citrix Workspace Environment Manager – Second Impressions

2017-03-08
/ /
in Blog
/

Workspace Environment Manager does not do Boolean logic.  It only does AND statements.

Note: These conditions are AND statements, not OR statements. Adding multiple conditions will require all of them to trigger for the filter to be considered triggered.

So we have to reconsider how we will apply some settings.  An example:

How would you apply this setting without WEM?  The setting is targeted to a specific set of servers that we’ve put into a group, AND the user must be a member of at least 1 out of about 10 groups for it to apply.  And this is what we’ve set.  It seems fairly logical and the the way the targeting editor is setup it’s actually fairly easy to follow.

How do we configure this in WEM?

WEM works by applying settings based on your individual user account or a group containing users.  So each group needs to be added to WEM and then configured individually.  Alternatively, this can change your thinking and you can create groups that you want settings to apply towards.  For instance, in this example I only have two statements I need to be true.

The server must belong to a particular group.
The user must belong to at least one of several groups.

So I started by attempting to get to the result we wanted.

I exported the value as we wanted it from the registry and imported it into WEM:

Now to create our conditions. I want this registry action to only apply if you are on a specific server.  We add our servers into logical groups.  I attempted to set the filter condition to target the server with this specific AD group.  It turns out this was not possible with the builtin filter conditions.

The Active Directory filter conditions are only evaluated against the user and not the machine.  However, there is an option for ‘ComputerName Match’:

Our server naming convention is CTX (for Citrix) APP or silo name (eg, EPIC) ### (3 digit numerical designation) and the letter “T” at the end for a test server or no letter for a production server.  So our naming scheme would look like so:

CTXAPP301 – for a prod server
CTXAPP301T – for a test server

The guide for WEM does specify you can use a wild card “*” to allow arbitrary matches.  So I tested it and found it has limitations.

The wild card appears to only be valid if it’s at the beginning or end of the name.  For instance, setting the match string to:

CTXAPP3*

Will match both the test and prod servers.  However, setting CTXAPP3*T and the wildcard is ‘dropped’ from the search (according to the WEM logs).  So the search result is actually CTXAPP3T.  Which is utterly useless.  So a proper naming scheme is very important.  Unfortunately, this will not work for us unless we manually specify all of our computer names.  For example, we would have to do the following for this to work:

CTXAPP301T, CTXAPP302T, CTXAPP303T, etc.

Although this works, this is not workable.  Adding/removing servers would require us to keep this filter updated fairly constantly.  Is there another option?

Apparently, you can execute a WMI Query.  We can use this to see if the computer is a member of a group.  The command to do so is:

WEM supports variable replacement through a list of HASHTAGS

I modified my command as such:

Executing this command does result in checking to see if the computer is a member of a specific group.  The only caveat is the user account that logs on must have permissions within AD to check group memberships.  Which is usually granted.  And in my case, it is so this command works.

Next on the Group Policy Preference is the ‘AND – OR’.  We have the filter condition now that ensures these values will only be applied if the computer is in a certain group.  Now we need the value to apply if the user is a member of a certain group.

An easy solution might be to create a super-group “HungAppTimeout” or some such and add all the groups I want that value to apply to that group.  Another alternative is we can ‘configure’ each user group with the ‘server must belong to group’ filter and that should satisfy the same requirements.  I chose this route for our evaluation of WEM to avoid creating large swaths of groups simply for the purpose of applying a single registry entry.

Instead of doing the ‘OR’, we select the individual groups that this check would be against and actually specify that we apply this settings to that group.

To do that we add each group to the ‘Configured Users’:

And then for each group, under ‘Assignment’ we apply our setting with the filter:

And now each group has this value applied with the appropriate conditions.

Continuing, we have the following policy:

So this filtering is applied to a collection, not the individual settings.  The filtering checks to see if the computer is a member of a specific group of servers, and whether the user is a member of a specific group.

In order to accomplish this same result I have no choice but to create a parent group for the machines.  Instead of an ‘OR’ we create a new group and add both server groups within it.  This should result in the same effective ‘OR’ statement for the machine check, at least.  Then we apply all the settings to the specific groups so only they get the values applied.

In total we have 154 total individual registry entries we apply.

So how does it compare to Group Policy Preferences?

Stay Tuned 🙂

Read More

Citrix Workspace Environment Manager – First Impressions

2017-03-02
/ /
in Blog
/

Citrix Workspace Environment Manager can be used as a replacement for Active Directory (AD) Group Policy Preferences (GPP).  It does not deal with machine policies, however.  Because of this AD Group Policy Objects (GPO) are still required to apply policies to machines.  However, WEM’s goal isn’t to manipulate machine policies but to improve user logon times by replacing the user policy of an AD GPO.  A GPO has two different engines to apply settings.  A Registry Policy engine and the engine that drives “Client Side Extensions” (CSE).  The biggest time consumer of a GPO is processing the logic of a CSE or the action of the CSE.  I’ll look at each engine and what they mean for WEM.

Registry Extension

The first is the ‘Registry’ policy engine.  This engine is confusingly called the “Registry Extension” as opposed to the CSE “Group Policy Registry”.  The “Registry Extension” is engine applies registry settings set in via ‘Administrative Templates’.

These settings are ‘dumb’ in that there is no logic processing required.  When set to Enabled or Disabled whatever key needs to be set with that value is applied immediately.  Processing of this engine occurs very, very fast so migrating these policy settings would have minimal or no improvement to logon times (unless you have a ton of GPO’s apply and network latency becomes your primary blocker).

If you use ControlUp to Analyze GPO Extension Load Times it will display the Registry Extension and the Group Policy Registry CSE:

Client Side Extensions

However, CSE’s allow you to put complex logic and actions within that require processing to determine if a setting should be applied or how a settings should be applied.  One of the most powerful of these is the Registry CSE.  This CSE allows you to apply registry settings with Boolean logic and can be filtered on a huge number of variables.

All of this logic is stored in a XML document that is pulled when the group policy object is processed.  This file is located in “C:\ProgramData\Microsoft\Group Policy\History\GUID OF GPO\SID OF USER\Preferences\Registry”.

Parsing and executing the Boolean logic takes time.  This is where we would be hopeful that WEM can make this faster.  The processing of all this, in our existing environment consumes the majority of our logon time:

Migrating Group Policy Preferences to WEM

Looking at some of our Registry Preferences we’ll look at what is required to migrate it into WEM.

Basic settings “eg, ‘Always applied’”.

“Visual Effects”

These settings have no filters and are applied to all users.  To migrate them to WEM I’ve exported these values and set them into a registry file:

Switching to WEM I select ‘Actions’ then ‘Registry Entries’ and then I imported the registry file.

An interesting side note, it appears the import excluded the REG_BINARY.  However you can create the REG_BINARY via the GUI:

To set the Registry Entries I created a filter condition called “Always True”

And then created a rule “Always True”

We have a user group that encompasses all of our Citrix users upon which I added in ‘Configure Users’.  Then, during the assignment of the registry keys I selected the ‘Always True’ filter:

And now these registry keys have been migrated to WEM.  It would be nice to ‘Group’ these keys like you can do for a collection in Group Policy Preferences.  Without the ability to do so the name of the action becomes really important as it’s the only way you can filter:

Next I’ll look at replacing Group Policy Preferences that contain some boolean logic.

Read More

Tracing Citrix Provisioning Service (PVS) Target Device Boot Performance – Process Monitor

2017-01-31
/ /
in Blog
/

Non-Persistent Citrix PVS Target Devices have more complicated boot processes then a standard VM.  This is because the Citrix PVS server components play a big role in acting as the boot disk.  They send UDP packets over the network to the target device.  This adds a delay that you simply cannot avoid (albeit, possibly a small one but there is no denying network communication should be slower than a local hard disk/SSD).

One of the things we can do is set the PVS target devices up in such a way that we can get real, measurable data on what the target device is doing while it’s booting.  This will give us visibility into what we may actually require for our target devices.

There are two programs that I use to measure boot performance.  Windows Performance Toolkit and Process Monitor.  I would not recommend running both at the same time because the logging does add some overhead (especially procmon in my humble experience).

The next bit of this post will detail how to offline inject the necessary software and tools into your target device image to begin capturing boot performance data.

Process Monitor

For Process Monitor you must extract the boot driver and inject the process monitor executable itself into the image.

To extract the boot driver simple launch process monitor, under the Options menu, select ‘Enable Boot Logging’

Then browse to your C:\Windows\System32\Drivers folder, and with “Show Hidden Files” enabled, copy out Procmon23.sys

It might be a good idea to disable boot logging if you did it on your personal system now 🙂

 

Now we need to inject the follow registry entry into our image:

Here are the steps in action:

Seal/promote the image.

On next boot you will have captured boot information:

To see how to use Windows Performance Toolkit for boot tracing Citrix PVS Target Device’s click here.

Read More

Tracing Citrix Provisioning Service (PVS) Target Device Boot Performance – Windows Performance Toolkit

2017-01-31
/ /
in Blog
/

Non-Persistent Citrix PVS Target Devices have more complicated boot processes then a standard VM.  This is because the Citrix PVS server components play a big role in acting as the boot disk.  They send UDP packets over the network to the target device.  This adds a delay that you simply cannot avoid (albeit, possibly a small one but there is no denying network communication should be slower than a local hard disk/SSD).

One of the things we can do is set the PVS target devices up in such a way that we can get real, measurable data on what the target device is doing while it’s booting.  This will give us visibility into what we may actually require for our target devices.

There are two programs that I use to measure boot performance.  Windows Performance Toolkit and Process Monitor.  I would not recommend running both at the same time because the logging does add some overhead (especially procmon in my humble experience).

The next bit of this post will detail how to offline inject the necessary software and tools into your target device image to begin capturing boot performance data.

Windows Performance Toolkit

For the Windows Performance Toolkit it must be installed on the image or you can copy the files from an install to your image in the following path:

To offline inject, simply mount your vDisk image and copy the files there:

 

Then the portion of it that we are interested in is “xbootmgr.exe” (aka boot logging).  In order to enable boot logging we need to inject the following registry key into our PVS Image:

Seal/promote the image.

On next boot you will have captured boot information:

To see how to use Process Monitor for boot tracing Citrix PVS Target Device’s click here.

Read More

AppV 5.1 Sequencer – Not capturing all registry keys – Update

2016-10-31
/ /
in Blog
/

My previous post touched on an issue we are having with some applications.  The AppV sequencer is not capturing all registry keys.  I have been working with Microsoft on this issue for over 2 years but just recently got some headway with getting this addressed.  And I have good news and bad news.  The good news is the root cause for this issue appears to have been found.

It appears that ETW (Event Tracing for Windows) will capture some of the events out of order and the AppV sequencer will then apply that out of order sequence.  The correct sequence of events should be:
RegDeleteKey
RegCreateKey

But in certain scenarios it’s capturing the events as:

RegCreateKey
RegDeleteKey

By capturing the deletion last in the order, the AppV sequencer is effectively being told to ‘Not’ capture the registry key.

Microsoft has corrected this issue in the ‘Anniversary’ edition of Windows 10 (Build 14393+) and sequencing in this OS will capture all the registry keys correctly.

The bad news is Microsoft is evaluating backporting the fix to older versions of Windows.  Specifically Windows 2008 R2.  Windows 2008R2 is still widely used and AppV best practice is to sequence on the OS you plan on deploying but if the older OS sequences unreliably this complicates the ability to ‘trust’ the product.  This fix still needs to be backported to 2012, 2012R2 and the related client OS’s, so hopefully they get it as well.  The reason I was told 2008 R2 may not get the fix is that it is no longer in Mainstream support, but Windows 7SP1 currently is, which is analogous to 2008 R2.  So hopefully the effort to fix this bug will be backported and we’ll have a solid AppV platform where we can sequence our packages in confidence.

Read More

Windows Update – 0x80070308 ERROR_REQUEST_OUT_OF_SEQUENCE

2016-10-31
/ /
in Blog
/

We are getting this error when trying to install KB3172605, which was re-released in September of this year.  This patch was originally installed in August without issue but the re-release fails to install.  The WindowsUpdate.log reports:

Examining the CBS.LOG for more information has revealed the following:

The error starts here:

One of the nice things about Windows Update is it kicks up Windows Error Reporting immediately when it detects an error.  So if you run procmon.exe you can find when the error occurs and work backwards only a little bit to, usually, find why it crashed.

In this case I ran procmon while I tried installing this update, went to 11:25:10 and did a search for “Windows Error Reporting”

windowsupdatebug1

The actual Windows Error Reporting kicked in at the check to see if it is disabled (DisableWerReporting).  The last key it read before it crashed was:

“HKLM\COMPONENTS\DerivedData\Components\amd64_0e471cf709070f76ea5797942bb36096_31bf3856ad364e35_6.1.7601.23455_none_9b9ebc8fa6659c8e”

If I open that key in Regedit here is what it looks like:

windowsupdatebug2

And what another key in that same list looks like:

windowsupdatebug3

 

It appears the key that is in error is missing some information.  In order to fix this I’m going to another system and try to grab they values needed for identity and S256H.  I don’t know how these keys are generated but I’m hoping they are generated by the file it’s referring to which *should* mean that a same or similar level patched system should have the same files and thus the same generated hash’s.

I went to another system and exported the values.  In order to find the proper key, we find it is referenced by the previous key in the procmon trace:
“amd64_microsoft-windows-b..environment-windows_31bf3856ad364e35_6.1.7601.23455_none_c7bdc8a2bca7…”

The key path was present in both systems because they were at the same patch level but the look of the key is different because of GUIDs are generated:

windowsupdatebug10

Working system – Notice the highlight is different

By tracing back in the working system I exported the key that “c!” is referencing.

Working System has S256H, identity keys

Working System has S256H, identity keys

I exported out the key from the working system and had to edit it so the GUID matched the broken key’s values.  I took the ‘working’ registry:

And the broken key:

I added the S256H and identity values to the ‘broken’ key:

And then added the last “c!” line by following these next steps:

  1. copied the line out:
    “c!c4ebacc5355..93c10e6175a_31bf3856ad364e35_6.1.7601.23455_8459b72d1e2f2700″=hex:
  2. Pasted the value from the registry “key” underneath with the “broken” value:
  3. Starting at the ellipse, counted the number of characters from the “55..” to “93′ underneath and created the value:
  4. Lastly, removed the ‘_none” from the string:
  5. I could then add “c!” in front and added it to my ‘broken’ registry file:

    I then tried to install the patch:

    windowsupdatebug9

And it worked!

Read More
Page 1 of 212