Author Archives trententtye

Printers and their impact on logon duration

2019-03-05
/ /
in Blog
/

40 second logons.  100 second logons.  200 second logons.  

Our users had an average logon time of 8-11 seconds.  But some users were hitting 40-200 seconds.  Users were frustrated and calling the help desk.

Why?

I did not want to individually examine all of the users whom had a long logon duration.  There was dozens of dozens of users, but after sampling 5 of them, the root cause was consistent.  Printers were causing long logons.  

What I found was users were encountering the long logons on specific workstations.  Further investigation revealed these workstations had globally mapped printers.  Some workstations had dozens of printers.  Some of the printers were long gone, having been moved/re-IP’ed, shutdown, or re-provisioned and the entry in the workstation not updated.  Citrix, depending on Receiver version, attempts to do a check to validate the printer’s operation before mapping and this would wait for a network timeout (or a crash of Receiver).  When I stopped the print spooler on the local workstation, logons came back down to the proper range.

What I sought to do was create something to make identifying this root cause easier, quicker, and with more specificity — in other words was there a particular printer causing the delay?

Citrix Printing Methods

Citrix has two methods of printing.  “Direct connection” and local printing.  Direct connection has the potential for larger impact on logon duration as it executes additional steps and actions on the Citrix server.

To enable “Direct Connection”, Citrix has a policy “Direct connection to print servers”.  By default, this policy is enabled.

With this policy enabled, it changes the behavior of how Citrix connects printers you have mapped locally as network printers.  Citrix has a diagram here:

But I feel this is missing temporal information.  I’ve created a video to highlight the steps.

Direct Connection Process

Why is this important?

Citrix has an option that is an absolute requirement for some applications.  
“Wait for printers to be created (server desktop)” (Virtual Apps and Desktops 7.x) or “Start this application without waiting for printers to be created. (Unchecked)” (Citrix XenApp 6.5).

This policy needs to be enabled for numerous applications that require a specific printer is present before the application starts.  This is usually due to applications that have pre-defined printers like label printers and do a check on app launch.  If you have an application like that you probably  have this feature enabled.

Citrix has many options for managing printers and they can impact your logon process.

Focusing on Direct Connection being enabled, if “Wait for printers to be created (server desktop)” policy is also enabled, care MUST be taken to minimize logon times.

Direct connection does something that can have a very adverse affect on logon time that is enabled by default.  When it connects directly to the print server it will check and install the print driver on the Citrix server.  The installation activity is visible via process monitor.

The DrvInst.exe process is the installation of printer drivers

The installation of printer driver, by default, will only occur if the driver is inbox or prestaged on the Citrix server.  But the check to see if a print driver needs to be installed occurs every time a session is created.  This behavior can be changed by policy.

Why does this impact logon times?  Driver installation can take a while, especially if there is communication issues between the Citrix server and print server.  Or if the print server is under stress and just simply slow to respond.  Or if the driver package that needs to be loaded is especially large, has lots of forms or page formats, multiple trays, etc.  This all adds up.

How can I tell what the impact of printers might be on Logon Duration?

I’ve been working on updating a Script-Based Action (SBA) originally posted by Guy Leech on the Citrix blogs.  This enhancement to the SBA will output more information breaking down what’s consuming your logon times.  Specifically, this tackles how long printers take in the ControlUp column: “Logon Duration – Other”

I’ve created a video showing how to enable the additional logging and how to run the new Script Based Action and what the output looks like:

What’s new in the output?

For the individual printers, direct connection printers now show two lines, the Driver load time and Printer load/connection time. 

The driver load time is titled with Driver and then the UNC path to the printer. 

The Printer and UNC path shows the the actual connection, printer configuration and establishment time.  All Direct Connection printers will be shown with UNC paths.

Citrix Universal Print Driver (UPD) mapped printers using the Citrix Universal Print Driver (print jobs that get sent back to the client for processing) are shown with their “Friendly Name” and do not have a matching “Driver” component as no driver loading is required.  In the example output above, “\\printsrv\HP Color LaserJet 5550” is a direct connection printer and “Zebra R110Xi HF (300 dpi)” is a Citrix UPD printer.

Here is an example of the full output:

The “Connect to Printers” is a sub-phase under “Pre-Shell (Userinit)“.  Each “Printer:” and “Driver:” is a sub-phase under “Connect to Printers“.  In this example screenshot, “Pre-Shell (Userinit)” was consuming 117.6 seconds of a 133.2 second logon duration.  Prior to this script that was all that was known.  Now we can see that connecting to printers consumes 97.7 of those 117.6 seconds!

Awesome!  What’s the catch?

In order to generate this information, some more verbose logging is required on the Citrix server side.   I specifically worked to ensure that no 3rd party utilities would be required so we simply need to enable 4 additional features.

Command Line Audit
PrintServer/Operational Logs
Audit Process Creation/Termination

I’ve written a batch script to enable these features:

Easy!

Wait a minute!  Windows Server 2008R2 doesn’t have Command Line Auditing!

Good catch!  Windows Server 2008R2 doesn’t have command line auditing so it will miss the driver load information.  There is no way around this using native tools, but modification could be done to use a 3rd party tool like sysmon.  The output on a 2008R2 system would show the following (with all other features enabled).

Awesome!  Let me at this script!

This script is available on the ControlUp site here.  If you have ControlUp you can run simply add it via the “Script Based Actions” button and further derive more information on your users logon performance!  

Last Gotcha

In some instances the “Connect to Printers phase” may start before “Pre-Shell”. This has been observed and is actually happening! The operating system is attempting to start *some* printer events asynchronously in certain situations and so the connect to Printers may start before userinit, but overall connecting to printers may still be blocking logons while it trys and completes.

Steve Elgan kindly pointed out that it’s important you size your event logs to ensure the data is present for the users whom you want to monitor. If your event logs are sized too small than the data being queried by the script won’t be present. So ensure you have your Security and the Printer event logs sized large enough to capture all of this data! For an idea of scale, you can look at your log size and the oldest entry there. If your log is 1MB in size and your oldest entry is from an hour ago, than sizing your log to 24MB should capture about a day’s worth of information.

Hope this helps you reduce your long logon durations!

Read More

Citrix Logon Simulator’s – Part 2

2019-02-06
/ /
in Blog
/

In my previous post I was looking at utilizing a Logon Simulator to setup some proactive monitoring of a Citrix environment.  I setup some goals for myself:

  1. Minimize the number of VM’s to run the robots
  2. As little resource consumption as possible
  3. Still provide operational alerts
  4. Operate on-premise

I want the footprint of these robots to be tiny.  This must done on Server Core.

I want to run multiple instances of the logon simulator concurrently.

I need to be able to test “Stores” that do not have “Receiver for Web” sites enabled.

I want it so if I reboot the robot he picks up and starts running.

The Choice.

In order to successfully hit these specified targets I opt’ed to use ControlUp’s Logon Simulator.  It can target the Store Service so it works with our “Receiver for Web”-less stores.  It also has the features to generate events that can be targeted to send out notifications of an application launch failure.

The Setup

In order to achieve my goals I need the following:

  • A Service Account that will be logging onto the Citrix servers
  • The robot (Windows 2019 Server Core)

I installed Server Core 2019 and added it to the domain.

Configure Autologon

I configured group policy preferences to setup AutoLogon for my service account.  This group policy object is set to the OU the robots reside in.

Group Policy Preferences settings to configure Autologon for our service account.

However, I did not include the required “DefaultPassword” registry with the password.  In order to embed the password in a more secure fashion, I had to manually use Sysinternals AutoLogons.  This keeps the password from being stored in plain text in the registry but this does need to be manually executed on each robot.

Configuring Autologon

The account MUST be a regular user and not a member of the “Administrators” group.  This is a requirement of the ControlUp Logon Simulator.

Prerequisites gotcha’s

Because I selected to use pure Server Core, there are some components that require fixing for full compatibility.  This can actually be alleviated immediately by installing the Feature On Demand (FoD) “Server App Compatibility”, but this would increase both memory utilization and consume more disk space for our robot.  However, if you prefer the easy way out, adding the FoD fixes everything and you can skip the “Fixes” section.  Or just run the Logon Simulator on a operating system with the desktop experience.  Otherwise, follow the steps in each of the solutions.

Fixes

Unable to install Citrix Reciever/Workspace App

“TrolleyExpress.exe – System Error”
The code execution cannot proceed because oledlg.dll was not found.  Reinstalling the program may fix this problem.

Solution

Copy oledlg.dll from the SysWow64 in either the install.wim or another “Windows Server 2019 (Desktop Experience)” install and put it in the C:\Windows\SysWow64 folder of your robot.

 

ControlUp Logon Simlulator is cropped on smaller displays

Solution

Set the resolution larger in your VM.

 

ControlUp Logon Simulator errors when you attempt to save the configuration

“The logon simulator failed for the following reason: Creating an instance of the COM component with CLSID {C0B4E2F3-BA21-4773-8DBA-335EC946EB8B} from the IClassFactory failed due to the following error: 80040111 ClassFactory cannot supply requested class (Exception from HRESULT: 0x80040111 (CLASS_E_CLASSNOTAVAILABLE)).  An application event log has been written to handle this crash.”

Solution

Copy ExplorerFrame.dll from the SysWow64 in either the install.wim or another “Windows Server 2019 (Desktop Experience)” install and put it in the C:\Windows\SysWow64 folder of your robot.  Add the following registry:

ControlUp Logon Simulator detects admin rights

Admin rights detected The logon simulator should not be run as an administrator, please restart the app as a standard user.

Solution

Run the logon simulator as a standard user.


Configuration

Once you’ve implemented all the fixes, install Citrix Workspace App and ControlUp Logon Simulator with an account that is an administrator.

Configure ControlUpLogonSim.  With the simulator open, enter your Storefront details, ensuring to use the “Store” account as seen in the Storefront console.

 

 

For the “Resource to Launch” ensure the name matches the display name in Storefront:

 

In order to avoid session stealing in the simulator, each application will require a unique user name.  Setup a unique account per application you are going to test.

 

From here, enter your logon credentials for the account associated with the application.  Run your first test by clicking the green triangle and ensure it works correctly.

 

Now that we have a successful run we set “Repeat Test” to ON and save the configuration.

I then created another application to monitor by renaming the “Resource to Launch” as another application and saved a second configuration.  I saved all my files to a C:\Swinst folder.

 

The point of all of this is to ensure the simulator is running in an automated fashion.  To do so, we need to be able to configure the simulator to “launch” multiple different applications when the operating system starts.  We have already configured autologon, we’ve setup our configuration files for each application we want to monitor, now we need to set the monitor to auto-start.

Add the following registry key:

And create a file “C:\Swinst\StartAppMonitors.cmd” with the following contents:

And watch the magic fly!

And so the final question and the point of all this work, how much does this consume for our resources?

 

1.1GB of RAM for the ENTIRE system, a peak CPU consumption of 7%, and the processes required to do the monitor use no CPU and only ~55MB of RAM.  Each Citrix process consumes ~20MB of memory and is the most significant consumer of CPU but at the single digit % range.

I anticipate doing some more stress testing to determine what the maximum amount of monitors I can get on one system, but I’m thrilled with these results.  With this one box I would expect to be able to monitor dozens of application…  Maybe a hundred?

In the end, this was a fair bit of work to get this setup on Server Core, but I do believe the savings in resource consumption and overhead reduction will pay off.

Read More

Citrix Logon Simulator’s – Part 1

2019-02-04
/ /
in Blog
/

“Help!  I can’t launch my application!”
“Is something wrong with Citrix?”
“Hey man, I heard Citrix was down?”
“Can you help? I need to get this work done!  The deadline is today and I can’t open my app!”

Welcome to the world of a Citrix Administrator.  If an application stops working then the calls flood in and you get pinged a million times.  Is there a way to be proactive about when an application goes down so you maximize your time trying to fix the issue between the failure and first call?

The answer to this is a Citrix Logon Simulator.

There are a few different logon simulators out there including two powershell scripts I wrote for performance testing.  One for testing the “Web” service and one for the “Store” service.  The difference between the two services is found in the name, with “Web” typically appended to the web service (eg, “/Citrix/StoreWeb”) and the “Store” service ending thusly (eg, “/Citrix/Store”).

The “Web” service is the user-facing front end for Storefront.  When you open a browser and go to your Storefront URL you are using the web service.

“Web” service. User logs into Storefront and launches an app using a web browser

Each “Web” service has a corresponding “Store” service.  However, the “Store” service does not require a “Web” service and you can create Store services without a Web Service.  Store services are used when you configure Citrix Reciever/Workspace App to connect to Storefront.  When you launch apps via Citrix Receiver/Workspace App you are using the Store service.  In addition, when thin clients are configured to use Storefront they, typically, use the Store service.

Using the “Store” service. Not the program is “Citrix Workspace” and not a web browser.

I haven’t found a logon simulator that tests each service, the products out there only test one or the other.

In Summary

Web Service

Testing the web service simulates a users experience authenticating and launching an application via a web browser.  These simulators launch a web browser, browse to your URL, login, find your application and then click it to launch it.  This does make an impressive demo of automation watching the web browser do actions without a human.

Store Service

Testing the store service simulates a user authenticating and launching an application, via a Citrix client.  This is done through API’s or REST API calls, it’s up to the client to generate a GUI from the information returned (if desired).

Drawbacks

Web Service Simulators

Executing the logon simulators that use the web service is impressive watching the web browser manipulate itself.  However, the requirement of using a web browser was limiting in that executing multiple concurrent applications was not feasible.  Options seemed to range from providing a list of applications to be tested (which are tested sequentially) or expanding the number of VM’s that will run the logon simulator.  In addition, it’s possible to create “Store” services that host applications without the corresponding “Web” service. Simulators that only test the Web Service will be unable to do any type of testing for these Stores.

Single VM

If you opt for a single VM to test as many applications as possible, the time between applications increases linearly.  For 10 applications to be tested to validate they are operational, with a 2 minute interval you will only be testing that application every 20 minutes (at best).  If you have 100 applications you’ll be waiting over 3 hours between tests.

Multiple VM’s

Multiple VM’s operate very well with the web service.  But they introduce another wrinkle.  How feasible is it to expand your environment 100 VM’s to test if 100 applications are operational?  With a operating system overhead of ~1GB at best, you’ll be consuming at least 100vCPU and 100GB more of memory.  This is pretty much dedicating a host just for robots.  This may be worth it when compared to the cost of time when a application is down… But this is now becoming a very, very expensive solution.  Hardware costs for hosts, hypervisor licenses, and Windows licenses compound to the pain of a multi-VM solution.

Store Service Simulators

In terms of a demo, the store service is less visually impressive.  The automation is done programmatically so you don’t get to see those gratifying “clicks”.  However, there is no browser requirement.  This does mean you don’t need a full GUI operating system to run a store service logon simulator.  In addition, because the Store Service simulator operates via API calls, it’s completely possible to run multiple in parallel.  This means there is a very large opportunity to save VM costs by consolidating all the testing into one, or just a couple VM’s and for each test it will keep that tight interval.  However, the draw back of the Store Service simulator is additional configuration is required for testing through a Netscaler.  Essentially, if you have not setup the DNS SRV record to allow Store Service communication than a Store Service simulator will not work externally.

What’s the plan?

After exploring these considerations, I set out to design something with some goals.

  1. Minimize the number of VM’s to run the robots
  2. As little resource consumption as possible
  3. Still provide operational alerts
  4. Operate on-premise

My next post will explore how to achieve this and the solution I settled on.

Read More

Citrix PVS Image Management Design Challenges – Case Study

2018-10-29
/ /
in Blog
/

Challenge:

Different applications have differing levels of operating system support.  A vendor might only support their app on a specific operating system, or multiple /older/ operating systems, or the organization hasn’t updated to a new version that supports the new OS; whether because the new version has a cost attached to upgrading the org cannot afford at this time, the people that can assist the upgrade are tied up in other projects, or the org has lowered the priority of the app because of low user count and no impact to the app functionality outside of newer OS support.  Whatever the situation, inevitably, someone will come to the Citrix team and ask that these applications be made available.  And, in all honesty, Citrix has made this all but possible with the 7.1x product.  The LTSR of XenApp/XenDesktop supports 2008R2 for the next 4 years, and 2012R2 and 2016 will continue to support the latest release until the next LTSR where 2012R2 will probably receive another 5 year reprieve from the Citrix side.  You probably won’t have Microsoft support, but that doesn’t stop organizations from continuing to run Server 2003 for apps that can’t move beyond that platform.

What does this mean:

Application support requirements now necessitate multiple OS’s available thru Citrix.  The goal is to have the app on the highest supported OS.  For instance, if an app is supported on 2008R2 and 2012R2, have the app run on 2012R2.  By December 2018, this now means supplying the organization the opportunity and flexibility to have their app on a choice of “5” different operating systems:
2008R2
2012R2
2016
2019
Windows 10 ERS

All on the XenApp 7.1x platform.

By providing these choices, we should be ensuring maximum application compatibility and support from the vendor, and future proofing.

Current State:

The original design of our environment was a single image, and using layering technologies to add software without causing compatibility issues.  The technology of choice was AppV (version 4.6 as of the original design).

It was simple, it would be effective, and it would be easy to manage.

Using machine group memberships, AppV layers could be customized for the machine.  For instance, we could have pools of servers that have different personalities.  We had a “Generic” pool upon which 200 or so AppV applications would reside and if a new application request came down, instead of silo’ing a XenApp VM for 3-10 users, we could put them on the generic pool and grow the generic pool if required.  This is very cost effective, reduces overhead on the hosts by reducing the number of VM’s, reduces the number of uniquely sized VM’s and overall made management easier.  Utilizing AppV it became trivial to have multiple applications, even the exact same versions of the applications but different configurations available for users to run from the same VM.  However, we have large user-based applications with specific performance goals and monitored metrics that are easier to mange when they run in their own silo.  And with AppV we could still use that same image but apply only the specific AppV packages and have the machines fully unique for those users.  And from there it again became trivial to manage the single image.

Utilizing this model has reduced and kept our image count low while maintaining a high degree of flexibility.  We have servers targeted for specific and unique purposes or servers hosting applications that are more general in nature.  By reducing the amount of different VM configurations running on a variety of different hosts, we are able to fairly accurately predict our performance and thus plan accordingly for sizing.

Are we really on one image?

No.  We have 16 of them.

Although one image was the original design goal, required OS features crippled our ability to maintain a low image count.  The worst offenders were applications with specific Internet Explorer version requirements.  Being unable to run multiple versions of IE on a single image meant we had to split our single image.  And we ended up splitting it into 3: images with IE9, IE10, and IE11.  To top it further off, when we originally went through the design with AppV, AppV5.0 had just come out and had crippling limitations.  AppV 4.6 was still the preferred platform while AppV5 matured.  Eventually, we had a couple applications that would not work on 4.6 due to hitting the package size limit (2GB IIRC) or due to crashing when running within the bubble.  AppV5.0 at the time was just plain broken and a huge number of our apps just outright failed.  Due to these limitations, our decision was to create new images; branching the 3 images even further to a total of 8.

So how did you get to 16?

We have a completely separate test environment.  Once an image was created, it was duplicated for test.  The design was to ensure any application upgrades, component upgrades, Windows Updates, policy changes, troubleshooting, etc would be done in the test environment first before being implemented in prod.  This has worked fairly well, however we encountered something that caused our test environment to not be reflective of production: TIME.

One of the challenges we have faced was the testing of applications or components or tweaks lasting months.  Testing a fix/feature/upgrade usually followed this path:

The challenges in this process occurs if the development of fixes/upgrade/enhancements spans multiple image “baking” sessions over a time period of weeks or months.  It seems no matter how good you think your documentation is, if the development spans 10’s to 100’s of tweaks over sufficient time, something important will be lost in the chain.  The process of enabling the change into production followed this process:

Couldn’t you just copy test into production?

In the original design test and production were wholly separate and lived apart.  The process of documenting and pushing changes from test into prod actually worked quite well until we started encountering the lengthy development/test change time scales, and numerous /different/ changes that occur across the different images.  Copying from test into production was done a few times and painful results each time.  The original image and design was not done to accommodate such a scenario so items that needed to be updated from test to prod were numerous and onerous.  Scripts, registry, files, group policies…  during some of these copies outages were incurred, again, because something was inevitably missed.  At this point, trying to flush out all the different changes that would have to be done across 8 images and than maintaining the list was fairly daunting for our team.  Having outages is a really, really bad thing.

What’s wrong with 16 images?

Our images have branched from each other in various forms going back to that first image created around 2011.  These images all have lineages 7 years old and with age comes problems.  We have been finding that these images have had some corruption in some form or another.  Whether it’s a registry key that’s lost its ACL or is completely corrupted, Windows Updates that fail to install, a file whose content is now garbage as opposed to clean text, or the complete inability to uninstall or upgrade some software packages.  Thus far, none of these problems have been unsolvable, but the fact they are cropping with more regularity across all of our images is very worrisome.  The time taken to resolve these issues as they occur is getting excessive and more common.  When we looked at just updating these images for XenDesktop 7 we found we were unable to do so because XenApp 6.5 would not uninstall from some of them!

Last word on “Current State”

Being a large environment, we could have several application/components simultaneously under this test/development process; compounding the number of changes.

Because of a separation of test and production the amount of changes during the development stage targeting test servers/components/configs inevitable have something get lost/forgotten/missed.  Copying test images to production is untenable in this process.  So many hard-coded pieces embedded within the test images or configurations that need to be updated that when we *have* had to a test to prod copy things break and a lot of effort goes into tracking these additional hard-coded changes.

All of these issues exist within just a single OS, with a single version of Citrix.  The future will be multi-OS with multiple versions of the Citrix VDA.  Our current design is unfeasible if the various OS’s start encountering similar, unique image requirements.  And it may be naive to think we can avoid it.  But I believe we can put forth a design to make this more manageable, more predictable and more stable.

Read More

Meltdown + Spectre – Performance Impact Analysis (Take 2)

2018-09-14
/ /
in Blog
/

I want to start this off by thanking ControlUp, LoginVSI and Ryan Ververs-Bijkerk for his assistance in helping me with this post.

Building on my last evaluation of the performance impact of Meltdown and Spectre, I was graciously given a trial of the LoginVSI software product used to simulate user loads and ControlUp’s cloud based analytic tool, ControlUp Insights.

This analysis takes into consideration the differences of operating systems and uses the latest Dell server hardware platform with a Intel Gold 6150 as the processor at its heart.  This processor contains the full PCID instructions to give the best performance (as of today) with mitigations enabled.  However, only Server 2012R2 and Server 2016 can take advantage of these hardware features.

Test Setup

This test was setup on 4 hosts.  2 of the hosts had VM’s where all the mitigations were enabled and 2 hosts had all of the mitigation features disabled.  I tested live production workloads and simulated user loads from LoginVSI.  The live production workloads were run on XenApp 6.5 on 2008R2 and the simulated workloads were on XenApp 7.15CU2 with 2008R2, 2012R2 and 2016.

Odd host numbers had the mitigation disabled, even host number had the mitigation enabled.

I sorted my testing logically in ControlUp by folder.

Real World Production results

The ControlUp Insights cloud product produced graphs and results that were easy and quick to interpret.  These results are for XenApp 6.5, Server 2008R2.

Hosts View

CPU Utilization

The mitigation-disabled “Host 1” had a higher consistency of using less CPU than the mitigation-enabled “Host 2”. The biggest spread in CPU was ~20% on the Intel Gold 6150’s with mitigation enabled to disabled.

IO Utilization

Another interesting result was IO utilization increased by an average of 100 IOPS for mitigation-enabled VM’s.  This means that Meltdown/Spectre also tax the storage subsystem more.  This averaged out to a consistent 12% hit in performance.

User Experience

The logon duration of the VM’s increased 43% from an average of 8 seconds on a mitigation-disabled VM to 14 seconds on a mitigation enabled VM.  The biggest jump in the sub-metrics were Logon Duration (Other) and Group Policy time going from 3-5s and 6-8s.

Mitigation Disabled

Mitigation Enabled

For applications we have that measure “user interactivity” the reduction in the user experience was 18%.  This meant that an action on a mitigation-enabled VM took an average of 1180ms vs 990ms on a mitigation-disabled VM when measuring actions within the UI.

Honestly, I wish I had ControlUp Insights earlier when I did my original piece, it provides much more detail in terms of tracking additional metrics and presents it much more cleanly that I did.  Also, when the information was available it was super quick and fast to look and compare the various types of results.

Simulated Results

LoginVSI was gracious enough to grant me licenses to their software for this testing.  Their software provides simulated user actions including pauses for coffee and chatting between work like typing/sending emails, reading word or PDF documents, or reviewing presentations.  The suite of software tested by the users tends to be major applications produced by major corporations who have experience producing software.  It is not representative of applications that could be impacted the most by Spectre/Meltdown (generally, applications that are registry event heavy).  Regardless, it is interesting to test with these simulated users as the workload produced by them do fall under the spectrum of “real world”.  As with everything, your mileage will vary and it is important to test and record your before and after impacts.  ControlUp with Insights does an incredible job of this and you can easily compare different periods in time to measure the impacts, or just rely on the machine learning suggestions of their virtual experts to properly size your environment.

Since our production workloads are Windows Server 2008R2 based, I took advantage of the LoginVSI license to test all three available server operating systems: 2008R2, 2012R2, and 2016.  Since newer operating systems are supposed to enable performance enhancing hardware features that can reduce the impact of these vulnerabilities, I was curious to *how much*.  I have this information now. 

I tested user loads of 500, 300, and 100 users across 2 hosts.  I tested one with Spectre/Meltdown applied and one without.  Each host ran 15VM’s with each VM having 30GB RAM and 6vCPU for an CPU oversubscription of 2.5:1.  The host spec was a Dell PowerEdge M640 with Intel 6150 Gold processors and 512GB of memory.

2016 – Hosts View

500 LoginVSI users with this workload, on Server 2016 pegged the hosts CPU to 100% on both the Meltdown/Spectre enabled and disabled hosts.  We can still see the gap between the CPU utilization between the two with the Meltdown Spectre hosts

300 LoginVSI users with this workload, on Server 2016 we see the gap is narrow but still visible.

100 LoginVSI users with this workload, on Server 2016 we see the gap is barely visible, it looks even.

2012R2 – Hosts View

500 LoginVSI users in this workload on Server 2012 R2.  There definitely appears to be a much larger gap between the meltdown enabled and disabled hosts.  And Server 2012R2 non-mitigated doesn’t cap out like Server 2016.

300 LoginVSI users in this workload on Server 2012R2.  The separation between enabled and disabled is still very prominent.

100 LoginVSI users in this workload on Server 2012R2.  Again, the separation is noticeable but appears narrower with lighter loads.

2008R2 – Hosts View

500 LoginVSI users in this workload on Server 2008R2.   Noticeable additional CPU load with the Meltdown/Spectre host.  A more interesting thing is it apperas overall CPU utilization is lower than 2012R2 or 2016.

300 LoginVSI users in this workload on Server 2008R2.  The separation between enabled and disabled is still very prominent.

100 LoginVSI users in this workload on Server 2008R2.  I only captured one run and the low utilization makes the difference barely noticeable.

Some interesting results for sure.  I took the data and put it into a pivot table to highlight the CPU differences for each workload against each operating system.

This chart hightlights the difference in CPU percentage between mitigation enabled and disabled systems.  The raw data:

Again, interesting results.  2008R2 seems to have the largest average CPU seperation, hitting 14%, followed by 2012R2 at 11% and than 2016 having a difference of 4%.

One of things about these results is that they highlight the “headroom” of the operating systems.  2008R2 actually consumes less CPU and so it has more room for separation between the 3 tiers.  On the 2016, there is so much time spent where the CPU was pegged at 100% for both types of host that makes a difference of “0%”.  So although the smaller number on server 2016 may lead you to believe it’s better, it’s actually not.

This shows it a little more clear.  With mitigations *enabled*, Server 2008R2 can do 500 users at less average CPU load than 2016 can do 300 users with mitigations *disabled*.

From the get-go, Server 2016 appears to consume 2x more CPU than Server 2008R2 in all non-capped scenarios with Server 2012 somewhere in between.

When we compare the operating systems against the different user counts we see the impact the operating system choice has on resources. 

Mitigation Disabled:

100 Users
300 Users
500 Users

Mitigation Enabled:

100 Users
300 Users
500 Users

Final Word

Microsoft stated that they expected to see less of an impact of the Spectre/Meltdown mitigations with newer operating systems.  Indeed this does turn out to be the case.  However, the additional resource cost of newer operating systems is actually *more* than running 2008R2 or 2012R2 with mitigations enabled .  So if you’re environment is sized for running Server 2016, you probably have infrastructure that has already been spec’ed for the much heavier OS anyways.  If your infrastructure has been spec’ed for the older OS than you will see a larger impact.  However, if you’ve spec’ed for the larger OS (say for a migration activity) but are running your older OS’s on that hardware, you will see an impact but it will be less than when you go live with 2016.

Previously I had stated that there are two different, important performance mechanisms to consider; capacity and how fast work actually gets done.  All of these simulated measurements are about capacity.  I hope to see how speed is impacted between the OS’s, but that may have to wait for a future posting. 

Tabulating the LoginVSI simulated results without ControlUp Insights has taken me weeks to properly format the results.  I was able to use a trial of ControlUp Insights to look at the real world impact of our existing applications and workloads.  If my organization ever purchases Insights I would have had this post up a long time ago with even more data, looking at things like the storage subsystems.  Hopefully we acquire this product in the future and if you want to save yourself and your organization time, energy and effort getting precise, accurate data that can be compared against scenarios you create: get ControlUp Insights.


I’m going to harp on this, but YOUR WORKLOAD MATTERS MORE than these simulated results.  During this exercise, I was able to determine with ControlUp Insights that one of our applications is so light that we can host 1,000 users from a single host where that host struggled with 200 LoginVSI users.  So WORKLOAD MATTERS.  Just something to keep in mind when reviewing these results.  LoginVSI produces results that can serve as a proxy for what you can expect if you can properly relate these results to your existing workload.  LoginVSI also offers the capability to produce custom workloads tailored to your specific environment or applications so you can gauge impact with much more precision. 

Read More

User Profile Manager – Unavoidable Delays

2018-06-30
/ /
in Blog
/

I’ve been exploring optimizing logon times and noticed “User Profile Service” always showed up for 1-3 seconds.  I asked why and began my investigation.

The first thing I needed to do is separate the “User Profile Service” into it’s own process.  It’s originally configured to share the same process as other services which makes procmon’ing difficult.

Making this change is easy:

Now that the User Profile Service is running in it’s own process we can use Process Monitor to target that PID.

I logged onto the RDS server with a different account and started my procmon trace.  I then logged into server:

One of the beautiful things about a video like this is we can start to go through frame-by-frame if needed to observe the exact events that are occurring.  Process Monitor also gives us a good overview of what’s happening with the “Process Activity” view:

9,445 file events, 299,668 registry events.  Registry, by far, has the most events occurring on it.  And so we investigate:

  1. On new logins the registry hive is copied from Default User to your profile directory, the hive is mounted and than security permissions are set.

    Setting the initial permissions of the user hive began at 2:14:46.3208182 and finished at 2:14:46.4414112.  Spanning a total time of 121 milliseconds.  Pretty quick but to minimize logon duration it’s worth examining each key in the Default User hive and ensuring you do not have any unnecessary keys.  Each of these keys will have their permissions evaluated and modified.
  2. The Profile Notification system now kicks off.

    The user profile server now goes through each “ProfileNotification” and, if it’s applicable, executes whatever action the module is responsible for.  In my screenshot we can see the User Profile Service alerts the “WSE”.  Each key actually contains the friendly name, giving you a hint about its role:

    It also appears we can measure the duration of each module by the “RegOpenKey” and “RegCloseKey” events tied to that module.

    In my procmon log, the WSE took 512ms, the next module “WinBio” took 1ms, etc.  The big time munchers for my system were:
    WSE: 512ms
    SyncCenter: 260ms
    SHACCT: 14ms
    SettingProfileHandler: 4ms
    GPSvc: 59ms
    GamesUX: 60ms
    DefaultAssociationsProfileHandler: 4450ms (!)
  3. In the previous screenshot we can see the ProfileNotification has two events kicked off that it runs through it’s list of modules: Create and Load.  Load takes 153ms in total, so Create is what is triggering our event.
  4. DefaultAssociationsProfileHandler consumes the majority of the User Profile Service time.  What the heck is it doing?  It appears the Default Association Profile Handler is responsible for creating the associations between several different components and your ability to customize them.  It associates (that I can see):
    ApplicationToasts (eg, popup notifications)
    RegisteredApplications
    File Extensions
    DefaultPrograms
    UrlAssociations
    The GPO “Set Default Associations via XML file” is processed and the above is re-run with the XML file values.
  5. Do we need these associations?

    Honestly…   Maybe.

    However, does this need to be *blocking* the login process?  Probably not.  This could be an option to be run asynchronously with you, as the admin, gambling that any required associations will be set before the user gets the desktop/app…  Or if you have applications that are entirely single purpose that simply read and write to a database somewhere than this is superfluous.

  6. Can we disable it?

    Yes…

    But I’m on the fence if this is a good idea.  To disable it, I’ve found deleting the “DefaultAssociationsProfileHandler” does work, associations are skipped and we logon 1-4 seconds faster.  However, launching a file directly or shortcut with a url handler will prompt you to choose your default program (as should be expected).

I’m exploring this idea.  Deleting the key entirely and using SetUserFTA to set associations.

We have ~400 App-V applications that write/overwrite approximately 800 different registered applications and file extensions into our registry hive (we publish globally — this puts them there).  This is the reason why I started this investigation is some of our servers with lots of AppV applications were reporting longer UserProfileService times and tying it all together, this one module in the User Profile Service appears to be the culprit.  And with Spectre increasing the duration of registry operations 400% this became noticeable real quick in our testing.

Lastly, time is still being consumed on RDS and server platforms by the infuriating garbage services like GamesUX (“Games Explorer”).  It just tweaks a nerve a little bit when I see time being consumed on wasteful processes.

Read More

Citrix Provisioning Service – Network Service Starting/Stopping services remotely

2018-05-02
/ /
in Blog
/

Citrix Provisioning Services has a feature within the “Provisioning Services Console” that allows you to stop/restart/start the streaming service on another server:

 

This feature worked with Server 2008R2 but with 2012R2 and greater it stopped working.  Citrix partially identified the issue here:

 

I was exploring starting and stopping the streaming service on other PVS servers from the Console and I found this information was incorrect.  Adding the NetworkService does NOT enable the streaming service to be stop/started/restarted from other machines.  The reason is the NETWORKSERVICE is a LOCAL account on the machine itself.  When it attempts to reach out and communicate with another system it is translated into a proper SID, which matches the machine account.  Since that SID communicating across the wire does not have access to the service you get a failure.

In order to fix this properly we can add either the machine account permissions for each PVS Server on each service OR we can add all machine accounts into a security group and add that as permissions to manipulate the service on each PVS Server.

I created a PowerShell script to enable easily add a group, user or machine account to the Streaming Service.  It will also list all the permissions:

An example adding a Group to the permissions to the service:

And now we can start the service remotely:

 

In order to get this working entirely I recommend the following steps:

  1. Create a Group (eg, “CTX.Servers.ProvisioningServiceServer”)
  2. Add all the PVS Machine Accounts into that group
  3. Reboot your PVS server to gain that group membership token
  4. Run the powershell script on each machine to add the group permission to the streaming service:
  5. Done!

And now the script:

 

Read More

Meltdown + Spectre – Performance Analysis

2018-04-30
/ /
in Blog
/

Meltdown and Spectre (variant 2) are two vulnerabilities that came out at the same time, however they are vastly different.  Patches for both were released extremely quickly for Microsoft OS’s but because of a variety of issues with Spectre, only Meltdown was truly available to be mitigated.  Spectre (variant 2) mitigation had a problematic release, causing numerous issues for whomever installed the fix, that it had to be recalled and the release delayed weeks.  However, around March/April 2018, the release of the Spectre patch was finalized and the microcode released.

Threat on Performance

Spectre (variant 2), of the two, threatened to degrade performance fairly drastically.  Initial benchmarks made mention that storage was hit particularly hard.  Microsoft made comments that server OS’s could be hit particularly hard.  Even worse, older operating systems would not support CPU features (PCID) that could reduce the performance impact.  Older OS’s suffer more due to the design (at the time) that involved running more code in kernel mode (fonts was signalled out as an example of one of these design decision) than newer OS’s.

As with most things on my blog I am particularly interested in the impact against Citrix/Remote Desktop Services type of workloads.  I wanted to test ON/OFF workloads of the mitigation impacts.

Setup

My setup consists of two ESXi (version 6.0) hosts with identical VM’s on each, hosting identical applications.  I was able to setup 4 of these pairs of hosts.  Each pair of hosts have identical processors.  The one big notable change is one host of each pair has the Spectre and Meltdown patch applied to the ESXi hypervisor.

The operating system of all the VM’s is Windows Server 2008 R2.  Applications are published from Citrix XenApp 6.5.

 

 

This is simply a snapshot of a single point in time to show the metrics of these systems.

Performance Considerations

Performance within a Citrix XenApp farm can be described in two ways.  Capacity and speed.

Speed

Generally, one would test for a “best case” test of the speed aspect of your applications performance.

A simplified view of this is “how fast can the app do X task?”

This task can be anything.  I’ve seen it measured by an automated script flipping through tabs of an application, where each tab pulled data from a database – rendered it – then moved on to the next tab.  The total time to execute these tasks amounted to a number that they used to baseline the performance of this application.

I’ve seen it measured as simply opening an excel document with macros and lots of formulas that pull data and perform calculations and measuring that duration.

The point of each exercise is to generate a baseline that both the app team and the Citrix team can agree to.  I’ve almost never had the baseline equal “real world” workloads, typically the test is an exaggeration of the actual workflow of users (eg, the test exaggerates CPU utilization).  Sometimes this is communicated and understood, other times not, but hopefully it gives you a starting point.

In general, and for Citrix workloads specifically, running the baseline test on a desktop usually produces a reasonable number of, “well… if we don’t put it on Citrix this is what performance will be like so this is our minimum expectation.”  Sometimes this is helpful.

Once you’ve established the speed baseline you can now look at capacity.

Capacity

After establishing some measurable level of performance within the application(s) you should now be able to test capacity.

If possible, start loading up users or test users running the benchmark.  Eventually, you’ll hit a point where the server fails — either because it ran out of resources, performance degraded so much it errors, etc.  If you do it right, you should be able to start to find the curve that intersects descending performance with your “capacity”.

At this point, cost may come into consideration.

Can you afford ANY performance degradation?

If not, than the curve is fairly easy.  At user X we start to see performance degrade so X-1 is our capacity.

If yes, at what point does performance degrade so much that adding users stops making sense?  Using the “without Citrix this is how it performs on the desktop” can be helpful to establish a minimum level of performance that the Citrix solution cannot cross.

Lastly, if you have network bound applications, and you have an appropriately designed Citrix solution where the app servers sit immediately beside the network resources on super-high bandwidth, ultra-low latency you may never experience performance degradation (lucky you!).  However, you may hit resource constraints in these scenarios.  Eg, although performance of the application is dependent on network, the application itself uses 1GB of RAM per instance of the application — you’ll be limited pretty quickly be the amount of RAM you can have in your VM’s.  These cases are generally preferred because the easy answer to increase capacity is *more hardware* but sometimes you can squeeze some more users with software like AppSense or WEM.

Spectre on Performance

So what is the impact Spectre has on performance — speed and/or capacity?

If Spectre simply makes a task take longer, but you can fit the same number of tasks on a given VM/Host/etc. then the impact is only on speed. Example: a task that took 5 seconds at 5% CPU utilization now takes 10 seconds at 5% CPU utilization.  Ideally, the capacity should be identical even though the task now takes twice as long.

If Spectre makes things use *more* resources, but the speed is the same, then the impact is only on capacity.  Example: a task that took 5 seconds at 5% CPU utilization now takes 10% CPU utilization.  In this scenario, the performance should be identical but your capacity is now halved.

The worst case scenario is if the impact is on both, speed and capacity.  In this case, neither are recoverable except you might be able to make up some speed with newer/faster hardware.

I’ve tested to see the impacts of Spectre in my world.  This world consists of Windows 2008 R2 with XenApp 6.5 on hardware that is 6 years old.  I was also able to procure some newer hardware to measure the impact there as well.

Test Setup

Testing was accomplished by taking 2 identically configured ESXi hosts, applying the VMWare ESXi patch with the microcode for Spectre mitigation to one of the hosts, and enabling it in the operating system.  I added identical Citrix VM’s to both hosts and enabled user logins to start generating load.

 

Performance needs to measured at two levels.  At the Windows/VM level, and at the hypervisor/host level.  This is because the Hypervisor may pickup the additional work required for the mitigation that the operating system may not, and also due to the way Windows 2008 R2 does not accurately measure CPU performance.

Windows/VM Level – Speed

I used ControlUp to measure and capture performance information.  ControlUp is able to capture various metrics including average logon duration.  This singular metric includes various system interactions, from using the network by querying Active Directory, pulling files from network shares, disk to store group policies in a cache, CPU processing which policies are applicable, and executables being launched in a sequence.  I believe that measuring logons is a good proxy for understanding the performance impact.  So lets see some numbers:

 

The top 3 results are Spectre enabled machines, the bottom 3 are without the patch.  The results are not good.  We are seeing a 200% speed impact in this metric.

With ControlUp we can drill down further into the impact:

Without Spectre Patch

 

With Spectre Patch

 

The component that took the largest hit is Group Policy.  Again, ControlUp can drill down into this component.

Without Spectre

 

With Spectre

All group policy preference components take a 200% hit.  The Group Policy Preferences functions operate by pulling down an XML file from the SYSVOL store, reading the XML file, than applying whatever resultant set of policies finds applicable.  In order to trace down further to find more differences, I logged into each type of machine, one with Spectre and one without, and started a Process Monitor trace.  Group Policy is applied via the Group Policy service, which a seperate instance of the svchost.exe.  The process can be found via Task Manager:

Setting ProcMon to filter only on that PID we can begin to evaluate the performance.  I relogged in with procmon capturing the logon.

Spectre Patched system on left, no patch on right

Using ProcessMonitor, we can look at the various “Summaries” to see which particular component may be most affected:


We see that 8.45 seconds is spent on the registry, 0.40 seconds on file actions, 1.04 seconds on the ProcessGroupPolicyExRegistry instruction.

The big ticket item is the time spent with the registry.

So how does it compare to a non-spectre system?

 

We see that 1.97 seconds is spent on the registry, 0.33 seconds on file actions, 0.24 seconds on the ProcessGroupPolicyExRegistry instruction.

Here’s a table showing the results:

So it definitely appears we need to look at the registry actions.  One of the cool things about Procmon is you can set a filter on your trace and open up the summaries and it will show you only the objects in the filter.  I set a filter for RegSetValue to see what the impact is for setting values in the registry:

RegSetValue – without spectre applied

 

RegSetValue – with spectre applied

1,079 RegSetValue events and a 4x performance degradation.  Just to test if it is specific to write events I changed the procmon filter to filter on “category” “Read”

 

Registry Reads – Spectre applied

 

Registry Reads – Spectre not applied

We see roughly the same ratio of performance degradation, perhaps a little more so.  As a further test I created a PowerShell script that will just measure creating 1000 registry values and test it on each system:

Spectre Applied

 

Spectre Not Applied

 

A 2.22x reduction in performance.  But this is writing to the HKCU…  which is a much smaller file.  What happens if I force a change on the much larger HKLM?

Spectre Applied

 

Spectre Not Applied

 

Wow.  The size of the registry hive makes a big difference in performance.  We go from 2.22x to 3.42x performance degradation.  So on a minute level, Spectre appears to have a large impact on Registry operations and the larger the hive the worse the impact.  With this information there is a large element of sense as to why Spectre may impact Citrix/RDS more.  Registry operations occur with a high frequency in this world, and logon’s highlight it even more as group policy and the registry are very intertwined.

This actually brings to mind another metric I can measure.  We have a very large AppV package that has a 80MB registry hive that is applied to the SOFTWARE hive when the package is loaded.  The difference in the amount of time (in seconds) loading this package is:

“583.7499291” (not spectre system)
“2398.4593479” (spectre system)

This goes from 9.7 mins to 39.9 minutes.  Another 4x drop in performance and this would be predominately registry related.  So another bullet that registry operations are hit very hard.

Windows/VM Level – Capacity

Does Spectre affect the capacity of our Citrix servers?

I recorded the CPU utilization of several VM’s that mirror each other on hosts that mirror each other with a singular difference.  One set had the Spectre mitigation enabled.  I then took their CPU utilization results:

Red = VM with Spectre, Blue = VM without Spectre

By just glancing at the data we can see that the Spectre VM’s had higher peaks and they appear higher more consistently.  Since “spiky” data is difficult to read, I smoothed out the data using a moving average:

Red = VM with Spectre, Blue = VM without Spectre

We can get a better feel for the separation in CPU utilization between Spectre enabled/disabled.  We are seeing clearly higher utilization.

Lastly, I took all of the results for each hour and produced a graph in an additive model:

This graph gives a feel for the impact during peak hours, and helps smooth out the data a bit further.  I believe what I’m seeing with each of these graphs is a performance hit measured by the VM at 25%-35%.

Host Level – Capacity

Measuring from the host level can give us a much more accurate picture of actual resources consumed.  Windows 2008 R2 isn’t a very accurate counter and so if there are lots of little slices, they can add up.

My apologies for swapping colors.  Raw data:

Blue = Spectre Applied, Red = No Spectre

Very clearly we can see the hosts with Spectre applied consume more CPU resources, even coming close to consuming 100% of the CPU resources on the hosts.  Smoothing out the data using moving averages reveals the gap in performance with more clarity.

 

Showing an hourly “Max CPU” per hour hit gives another visualization of the performance hit.

 

Summary

Windows 2008 R2, for Citrix/RDS workloads will be impacted quite highly.  The impact that I’ve been able to measure appears to be focused on registry-related activities.  Applications that store their settings/values/preferences in registry hives, whether they be the SOFTWARE/SYSTEM/HKCU hive will feel a performance impact.  Logon actions on RDS servers would be particularly impacted because group policies are largely registry related items, thus logon times will increase as it takes longer to process reads and writes.  CPU utilization is higher on both the Windows VM-level and the hypervisor level,  up to 40%.  The impact of speed on the applications and other functions is notable although more difficult to measure.  I was able to measure a ~400% degradation in performance for CPU processing for Group Policy Preferences, but perception is a real thing, so going from 100ms to 400ms may not be noticed.  However, on applications that measure response time, it was found we had a performance impact of 165%.  What took 1000ms now takes 1650ms.

At the time of this writing, I was only able to quantify the performance impact between two of the different hosts.  The Intel Xeon E5-2660 v4 and Intel Xeon E5-2680.

The Intel Xeon E5-2660 v4 has a frequency of 26% less than the older 2680.  In order to overcome this handicap, the processor must have improved at a per-clock rate higher than the 26% frequency loss.  CPUBenchMark had the two processors with a single thread CPU score of 1616 for the Intel Xeon E5-2660 v4 and 1657 for the Intel Xeon E5-2680.  This put them close but after 4 years the 2680 was marginally faster.  This has played out in our testing that the higher frequency processor is faster.  Performance degradation for the two processors came out as such:

 

CPU Performance Hit
Intel Xeon E5-2680 2.70GHz 155%
Intel Xeon E5-2660 v4 2.00GHz 170%

 

This tells that frequency of the processor is more important to mitigate the performance hit.

Keep in mind, these findings are my own.  It’s what I’ve experienced in my environment with the products and operating systems we use.  Newer operating systems are supposed to perform better, but I don’t have the ability to test that currently so I’m sharing these numbers as this is an absolute worst case type of scenario that you might come across.  Ensure you test the impact to understand how your environment will be affected!

Read More

Group Policy Preferences Registry Extension vs Group Policy Registry Extension

2018-04-11
/ /
in Blog
/

In various discussions I’ve read about the drawbacks of Group Policy Preferences but is it really that bad?

 

…Or is it how you are using it?

 

There are two methods of applying registry keys/values with Group Policy.  The Group Policy Registry Extension is the “traditional” form of applying policies.  Also known as ADM or ADMX policies, when creating GPO’s with this method a binary file, “.pol”, is created.  When policy application occurs this file is read and applied to your registry.  As a binary file, this file is kept small and fast.  Reading and applying the settings should be nearly instant.

The second method of applying registry keys is with Group Policy Preferences (GPP).  This was a “new” method introduced in Windows Server 2008 with the purchase of PolicyMaker by Microsoft.  Group Policy Preferences are much, much more flexible than the traditional form.  There are different ways of applying registry values, including the CRUD model (Create, Replace, Update, Delete), filtering by the way of “Item Level Targeting“, either on an individual value or on a collection.

I’ve seen an organization heavily leverage GPP to great success.  I started to wonder though, what are the performance impacts of using GPP over the traditional method.  This post will explore the differences in the CRUD model and how it compares to the traditional method..

I intend to look the following scenarios:

  1. Creating a registry value
  2. Updating a previous registry value
  3. Removing a registry value

However, GPP has a fourth method, “Replace” and I’ll explore what it does in addition to these 3 methods.

Creating a Registry Value

In this scenario, the registry will be clean and a new value will be created.  I’m going to refer to the Group Policy Registry Extension (AKA, Administrative Templates, ADM/ADX) as the “traditional” method and use the abbreviation GPP for the Group Policy Preferences Registry Extension.

Traditional:

After reading the Registry.Pol from the sysvol, the application of the registry key takes just 3 operations.  RegCreateKey, RegSetValue, and RegCloseKey.

Each one of these operations took around 1-1.1ms, with the caveat that Process Monitor (procmon) consumes some resources capturing this information, slowing it down slightly.

 

GPP:

We can see a new operation “RegQueryValue”.  As described by William Stanek, “The Create action creates a preference if it doesn’t already exist. For example, you can use the Create action to create and set the value of a user environment variable called CurrentOrg on computers where it does not yet exist. If the variable already exists, the value of the variable will not be changed.”

The RegQueryValue is executing the check to see if a variable already exists.  So what does GPP look like if the value is already present?

3 operations with the process exiting on a success on the value being present.

The end result, is 3 operations for our traditional method, and 4 operations for the Group Policy Preferences method for creating a registry entry.

Updating a registry value

In this scenario, the registry will contain a value, and the policy will be updated with a new value.  For the traditional method this will involve changing the Microsoft “User Profiles” policy.  I set the “HomeDir” location to “TrententTest”, applied the value, then updated it to “TrententTye”.  This will ensure a new, changed key is applied.  For GPP I’m going to change the value on the policy to 0x0 from 0x1 and use the “Update” operation.

Traditional:

Traditional maintains a very simple “3 operation” action with updating a value having the same effect as if the value was never present to begin with.

GPP:

With the “Update” action, GPP now executes just 3 operations, same as the traditional.

The end result, is 3 operations for our traditional method, and 3 operations for the Group Policy Preferences method for updating a registry entry.

Removing a registry value

In this scenario, I am going to remove a registry value.  Using the traditional method this means modifying my group policy to “Not Configured”, and for GPP this means setting “Delete” for our operation.

Traditional:

Again, Traditional performs it’s work in just 3 operations.

 

GPP:

GPP also performs this work in just 3 operations.

 

GPP – The Replace Method

Group Policy Preferences has another operation to explore.  “Replace”.

This operation …”creates preferences that don’t yet exist, or deletes and then creates preferences that already exist.”

This sounds like it performs a few operations.  Lets see what it looks like:

Replace executes “6” operations.  RegOpenKey, RegDeleteValue, RegCloseKey, RegCreateKey, RegSetValue, RegCloseKey.  I’m not entirely sure why you’d want a DeleteValue before SetValue but that’s what this selection does.

 

Revisiting GPP: “Creating a Registry Value”

During the process of creating this post, I wondered if the 3 operation “Update” would work better for creating a key.  The GPP “Create” selection has 4 operations, but the “Update” selection only has 3 operations.  I deleted my “TrententTestPreferences” key and refreshed group policy:

 

3 operations!  So Group Policy Preferences has the potential to operate at the same speed as the traditional group policy IF YOU STICK TO USING “UPDATE”.  At the very least, these operations should take the same amount of time.  Of course, implementation might be a different story.

The final tally:

Stay tuned for part 2 — The Performance Comparison

Read More

Group Policy – Monolithic vs Functional Design and Performance Evaluation

2018-04-09
/ /
in Blog
/

Group Policy Design is a hotly discussed topic, with lots of different ideas and discussions.  However, there is not a whole lot of actual metrics.  ADM and ADMX templates apply registry keys in an ‘enforced’ manner.  That is, if you or the machine has access to read the policies, the registry keys within are applied.  If you stuck purely to ADM/ADMX policies but wanted to do dynamic filtering or application of the keys/values based on a set of criteria you’d probably design multiple policies and nested organizational units (OUs).  From here, you could filter certain policies based on the machine or user location in the OU structure or by filtering on the policies themselves and denying access to certain groups or doing explicit allows to certain groups.  This design style, back in the day, was called “functional” design.

 

However, the alternative style, “monolithic” design, simplifies the group policy object (GPO) design into much fewer GPO’s.

 

Setup

My test setup is very simple; a organizational unit (OU) with inheritance blocked to control the application of the GPO’s.  I created 100 individual GPO’s with a single registry value, and 1 GPO with 100 values.  I chose to do a simple registry addition as it should be the best performance option for group policy.  I created a custom ADMX file for this purpose:

Monolithic simulation:

 

Functional simulation:

 

Testing

In testing these two designs I elected to focus on the one factor that would have the most impact: latency.  I setup my client machine in the OU, put a WAN emulator that can manipulate latency and measured the performance between the functional and monolithic designs at varying latencies.  I looked for the following event ID’s: 4257, 5257, 4016, 5016.  The x257 events correspond to when group policy downloads the group policy objects off the SYSVOL file share.  The x016 event’s determine how long it took the policy to be processed.

 

The results:

 

Raw Data:

Functional GPO – applying 100 registry values
Event ID 4016 to 5016 (ms)
Latency Time (ms)
0 271
10 4089
25 8078
50 15315
75 22904
100 29820
Event ID 4257 to 5257 – Starting to download policies
Latency Time (s)
0 0
10 3
25 6
50 12
75 17
100 22

 

Monolithic GPO – applying 100 registry values
Event ID 4016 to 5016 (ms)
Latency Time (ms)
0 117
10 156
25 198
50 284
75 336
100 435
Event ID 4257 to 5257 – Starting to download policies
Latency Time (s)
0 0
10 0
25 1
50 1
75 1
100 1

 

Analysis:

There is a huge edge to the monolithic design.  Both in terms of how long it takes to process a single policy vs multiple policies and the resiliency to the effects of latency.  Even ‘light’ latency can have a profound impact on performance.  Going from 0ms to 10ms increased the length of time to process a functional design by 15 times!  The monolithic design, on the other hand, was barely impacted.  Even with a latency of 100ms, it only added ~300ms of time to process the policy.  I would consider this imperceptible in the real world, where as the functional design going from ~271ms to ~4000ms would be an extremely noticeable impact!  Let alone about 30 seconds at 100ms!

Another factor is how much additional time is required to download the policies.  This is time in addition to the processing time.  This probably shouldn’t be a huge surprise, it appears that group policies are downloaded and processed sequentially, in an order.  I’m sure this is necessary to maintain some semblance of prediction if you have conflicting policies settings, the one last in the list (whether sorted alphabetically or what have you) can be relied on to be the winner.

Adding latency, even just a little latency, has a noticeable impact.  And the more policies, the more traffic, the more the impact of latency.  Again, a loss for a functional design and a win for a more monolithic design.

Conclusion:

Group Policy Objects can have a large impact on user experience.  The goal should be to minimize them to as few as possible.  As with everything, there are exceptions to the rules, but for Group Policy it’s important to try and maintain this rule.  Even just a little latency between the domain controller and the client can have a massive impact in group policy performance.  This can impact the length of time it takes a machine to boot, to delaying a user logging into a system.

 

Read More