XenApp

User Profile Manager – Unavoidable Delays

2018-06-30
/ /
in Blog
/

I’ve been exploring optimizing logon times and noticed “User Profile Service” always showed up for 1-3 seconds.  I asked why and began my investigation.

The first thing I needed to do is separate the “User Profile Service” into it’s own process.  It’s originally configured to share the same process as other services which makes procmon’ing difficult.

Making this change is easy:

Now that the User Profile Service is running in it’s own process we can use Process Monitor to target that PID.

I logged onto the RDS server with a different account and started my procmon trace.  I then logged into server:

One of the beautiful things about a video like this is we can start to go through frame-by-frame if needed to observe the exact events that are occurring.  Process Monitor also gives us a good overview of what’s happening with the “Process Activity” view:

9,445 file events, 299,668 registry events.  Registry, by far, has the most events occurring on it.  And so we investigate:

  1. On new logins the registry hive is copied from Default User to your profile directory, the hive is mounted and than security permissions are set.

    Setting the initial permissions of the user hive began at 2:14:46.3208182 and finished at 2:14:46.4414112.  Spanning a total time of 121 milliseconds.  Pretty quick but to minimize logon duration it’s worth examining each key in the Default User hive and ensuring you do not have any unnecessary keys.  Each of these keys will have their permissions evaluated and modified.
  2. The Profile Notification system now kicks off.

    The user profile server now goes through each “ProfileNotification” and, if it’s applicable, executes whatever action the module is responsible for.  In my screenshot we can see the User Profile Service alerts the “WSE”.  Each key actually contains the friendly name, giving you a hint about its role:

    It also appears we can measure the duration of each module by the “RegOpenKey” and “RegCloseKey” events tied to that module.

    In my procmon log, the WSE took 512ms, the next module “WinBio” took 1ms, etc.  The big time munchers for my system were:
    WSE: 512ms
    SyncCenter: 260ms
    SHACCT: 14ms
    SettingProfileHandler: 4ms
    GPSvc: 59ms
    GamesUX: 60ms
    DefaultAssociationsProfileHandler: 4450ms (!)
  3. In the previous screenshot we can see the ProfileNotification has two events kicked off that it runs through it’s list of modules: Create and Load.  Load takes 153ms in total, so Create is what is triggering our event.
  4. DefaultAssociationsProfileHandler consumes the majority of the User Profile Service time.  What the heck is it doing?  It appears the Default Association Profile Handler is responsible for creating the associations between several different components and your ability to customize them.  It associates (that I can see):
    ApplicationToasts (eg, popup notifications)
    RegisteredApplications
    File Extensions
    DefaultPrograms
    UrlAssociations
    The GPO “Set Default Associations via XML file” is processed and the above is re-run with the XML file values.
  5. Do we need these associations?

    Honestly…   Maybe.

    However, does this need to be *blocking* the login process?  Probably not.  This could be an option to be run asynchronously with you, as the admin, gambling that any required associations will be set before the user gets the desktop/app…  Or if you have applications that are entirely single purpose that simply read and write to a database somewhere than this is superfluous.

  6. Can we disable it?

    Yes…

    But I’m on the fence if this is a good idea.  To disable it, I’ve found deleting the “DefaultAssociationsProfileHandler” does work, associations are skipped and we logon 1-4 seconds faster.  However, launching a file directly or shortcut with a url handler will prompt you to choose your default program (as should be expected).

I’m exploring this idea.  Deleting the key entirely and using SetUserFTA to set associations.

We have ~400 App-V applications that write/overwrite approximately 800 different registered applications and file extensions into our registry hive (we publish globally — this puts them there).  This is the reason why I started this investigation is some of our servers with lots of AppV applications were reporting longer UserProfileService times and tying it all together, this one module in the User Profile Service appears to be the culprit.  And with Spectre increasing the duration of registry operations 400% this became noticeable real quick in our testing.

Lastly, time is still being consumed on RDS and server platforms by the infuriating garbage services like GamesUX (“Games Explorer”).  It just tweaks a nerve a little bit when I see time being consumed on wasteful processes.

Read More

Meltdown + Spectre – Performance Analysis

2018-04-30
/ /
in Blog
/

Meltdown and Spectre (variant 2) are two vulnerabilities that came out at the same time, however they are vastly different.  Patches for both were released extremely quickly for Microsoft OS’s but because of a variety of issues with Spectre, only Meltdown was truly available to be mitigated.  Spectre (variant 2) mitigation had a problematic release, causing numerous issues for whomever installed the fix, that it had to be recalled and the release delayed weeks.  However, around March/April 2018, the release of the Spectre patch was finalized and the microcode released.

Threat on Performance

Spectre (variant 2), of the two, threatened to degrade performance fairly drastically.  Initial benchmarks made mention that storage was hit particularly hard.  Microsoft made comments that server OS’s could be hit particularly hard.  Even worse, older operating systems would not support CPU features (PCID) that could reduce the performance impact.  Older OS’s suffer more due to the design (at the time) that involved running more code in kernel mode (fonts was signalled out as an example of one of these design decision) than newer OS’s.

As with most things on my blog I am particularly interested in the impact against Citrix/Remote Desktop Services type of workloads.  I wanted to test ON/OFF workloads of the mitigation impacts.

Setup

My setup consists of two ESXi (version 6.0) hosts with identical VM’s on each, hosting identical applications.  I was able to setup 4 of these pairs of hosts.  Each pair of hosts have identical processors.  The one big notable change is one host of each pair has the Spectre and Meltdown patch applied to the ESXi hypervisor.

The operating system of all the VM’s is Windows Server 2008 R2.  Applications are published from Citrix XenApp 6.5.

 

 

This is simply a snapshot of a single point in time to show the metrics of these systems.

Performance Considerations

Performance within a Citrix XenApp farm can be described in two ways.  Capacity and speed.

Speed

Generally, one would test for a “best case” test of the speed aspect of your applications performance.

A simplified view of this is “how fast can the app do X task?”

This task can be anything.  I’ve seen it measured by an automated script flipping through tabs of an application, where each tab pulled data from a database – rendered it – then moved on to the next tab.  The total time to execute these tasks amounted to a number that they used to baseline the performance of this application.

I’ve seen it measured as simply opening an excel document with macros and lots of formulas that pull data and perform calculations and measuring that duration.

The point of each exercise is to generate a baseline that both the app team and the Citrix team can agree to.  I’ve almost never had the baseline equal “real world” workloads, typically the test is an exaggeration of the actual workflow of users (eg, the test exaggerates CPU utilization).  Sometimes this is communicated and understood, other times not, but hopefully it gives you a starting point.

In general, and for Citrix workloads specifically, running the baseline test on a desktop usually produces a reasonable number of, “well… if we don’t put it on Citrix this is what performance will be like so this is our minimum expectation.”  Sometimes this is helpful.

Once you’ve established the speed baseline you can now look at capacity.

Capacity

After establishing some measurable level of performance within the application(s) you should now be able to test capacity.

If possible, start loading up users or test users running the benchmark.  Eventually, you’ll hit a point where the server fails — either because it ran out of resources, performance degraded so much it errors, etc.  If you do it right, you should be able to start to find the curve that intersects descending performance with your “capacity”.

At this point, cost may come into consideration.

Can you afford ANY performance degradation?

If not, than the curve is fairly easy.  At user X we start to see performance degrade so X-1 is our capacity.

If yes, at what point does performance degrade so much that adding users stops making sense?  Using the “without Citrix this is how it performs on the desktop” can be helpful to establish a minimum level of performance that the Citrix solution cannot cross.

Lastly, if you have network bound applications, and you have an appropriately designed Citrix solution where the app servers sit immediately beside the network resources on super-high bandwidth, ultra-low latency you may never experience performance degradation (lucky you!).  However, you may hit resource constraints in these scenarios.  Eg, although performance of the application is dependent on network, the application itself uses 1GB of RAM per instance of the application — you’ll be limited pretty quickly be the amount of RAM you can have in your VM’s.  These cases are generally preferred because the easy answer to increase capacity is *more hardware* but sometimes you can squeeze some more users with software like AppSense or WEM.

Spectre on Performance

So what is the impact Spectre has on performance — speed and/or capacity?

If Spectre simply makes a task take longer, but you can fit the same number of tasks on a given VM/Host/etc. then the impact is only on speed. Example: a task that took 5 seconds at 5% CPU utilization now takes 10 seconds at 5% CPU utilization.  Ideally, the capacity should be identical even though the task now takes twice as long.

If Spectre makes things use *more* resources, but the speed is the same, then the impact is only on capacity.  Example: a task that took 5 seconds at 5% CPU utilization now takes 10% CPU utilization.  In this scenario, the performance should be identical but your capacity is now halved.

The worst case scenario is if the impact is on both, speed and capacity.  In this case, neither are recoverable except you might be able to make up some speed with newer/faster hardware.

I’ve tested to see the impacts of Spectre in my world.  This world consists of Windows 2008 R2 with XenApp 6.5 on hardware that is 6 years old.  I was also able to procure some newer hardware to measure the impact there as well.

Test Setup

Testing was accomplished by taking 2 identically configured ESXi hosts, applying the VMWare ESXi patch with the microcode for Spectre mitigation to one of the hosts, and enabling it in the operating system.  I added identical Citrix VM’s to both hosts and enabled user logins to start generating load.

 

Performance needs to measured at two levels.  At the Windows/VM level, and at the hypervisor/host level.  This is because the Hypervisor may pickup the additional work required for the mitigation that the operating system may not, and also due to the way Windows 2008 R2 does not accurately measure CPU performance.

Windows/VM Level – Speed

I used ControlUp to measure and capture performance information.  ControlUp is able to capture various metrics including average logon duration.  This singular metric includes various system interactions, from using the network by querying Active Directory, pulling files from network shares, disk to store group policies in a cache, CPU processing which policies are applicable, and executables being launched in a sequence.  I believe that measuring logons is a good proxy for understanding the performance impact.  So lets see some numbers:

 

The top 3 results are Spectre enabled machines, the bottom 3 are without the patch.  The results are not good.  We are seeing a 200% speed impact in this metric.

With ControlUp we can drill down further into the impact:

Without Spectre Patch

 

With Spectre Patch

 

The component that took the largest hit is Group Policy.  Again, ControlUp can drill down into this component.

Without Spectre

 

With Spectre

All group policy preference components take a 200% hit.  The Group Policy Preferences functions operate by pulling down an XML file from the SYSVOL store, reading the XML file, than applying whatever resultant set of policies finds applicable.  In order to trace down further to find more differences, I logged into each type of machine, one with Spectre and one without, and started a Process Monitor trace.  Group Policy is applied via the Group Policy service, which a seperate instance of the svchost.exe.  The process can be found via Task Manager:

Setting ProcMon to filter only on that PID we can begin to evaluate the performance.  I relogged in with procmon capturing the logon.

Spectre Patched system on left, no patch on right

Using ProcessMonitor, we can look at the various “Summaries” to see which particular component may be most affected:


We see that 8.45 seconds is spent on the registry, 0.40 seconds on file actions, 1.04 seconds on the ProcessGroupPolicyExRegistry instruction.

The big ticket item is the time spent with the registry.

So how does it compare to a non-spectre system?

 

We see that 1.97 seconds is spent on the registry, 0.33 seconds on file actions, 0.24 seconds on the ProcessGroupPolicyExRegistry instruction.

Here’s a table showing the results:

So it definitely appears we need to look at the registry actions.  One of the cool things about Procmon is you can set a filter on your trace and open up the summaries and it will show you only the objects in the filter.  I set a filter for RegSetValue to see what the impact is for setting values in the registry:

RegSetValue – without spectre applied

 

RegSetValue – with spectre applied

1,079 RegSetValue events and a 4x performance degradation.  Just to test if it is specific to write events I changed the procmon filter to filter on “category” “Read”

 

Registry Reads – Spectre applied

 

Registry Reads – Spectre not applied

We see roughly the same ratio of performance degradation, perhaps a little more so.  As a further test I created a PowerShell script that will just measure creating 1000 registry values and test it on each system:

Spectre Applied

 

Spectre Not Applied

 

A 2.22x reduction in performance.  But this is writing to the HKCU…  which is a much smaller file.  What happens if I force a change on the much larger HKLM?

Spectre Applied

 

Spectre Not Applied

 

Wow.  The size of the registry hive makes a big difference in performance.  We go from 2.22x to 3.42x performance degradation.  So on a minute level, Spectre appears to have a large impact on Registry operations and the larger the hive the worse the impact.  With this information there is a large element of sense as to why Spectre may impact Citrix/RDS more.  Registry operations occur with a high frequency in this world, and logon’s highlight it even more as group policy and the registry are very intertwined.

This actually brings to mind another metric I can measure.  We have a very large AppV package that has a 80MB registry hive that is applied to the SOFTWARE hive when the package is loaded.  The difference in the amount of time (in seconds) loading this package is:

“583.7499291” (not spectre system)
“2398.4593479” (spectre system)

This goes from 9.7 mins to 39.9 minutes.  Another 4x drop in performance and this would be predominately registry related.  So another bullet that registry operations are hit very hard.

Windows/VM Level – Capacity

Does Spectre affect the capacity of our Citrix servers?

I recorded the CPU utilization of several VM’s that mirror each other on hosts that mirror each other with a singular difference.  One set had the Spectre mitigation enabled.  I then took their CPU utilization results:

Red = VM with Spectre, Blue = VM without Spectre

By just glancing at the data we can see that the Spectre VM’s had higher peaks and they appear higher more consistently.  Since “spiky” data is difficult to read, I smoothed out the data using a moving average:

Red = VM with Spectre, Blue = VM without Spectre

We can get a better feel for the separation in CPU utilization between Spectre enabled/disabled.  We are seeing clearly higher utilization.

Lastly, I took all of the results for each hour and produced a graph in an additive model:

This graph gives a feel for the impact during peak hours, and helps smooth out the data a bit further.  I believe what I’m seeing with each of these graphs is a performance hit measured by the VM at 25%-35%.

Host Level – Capacity

Measuring from the host level can give us a much more accurate picture of actual resources consumed.  Windows 2008 R2 isn’t a very accurate counter and so if there are lots of little slices, they can add up.

My apologies for swapping colors.  Raw data:

Blue = Spectre Applied, Red = No Spectre

Very clearly we can see the hosts with Spectre applied consume more CPU resources, even coming close to consuming 100% of the CPU resources on the hosts.  Smoothing out the data using moving averages reveals the gap in performance with more clarity.

 

Showing an hourly “Max CPU” per hour hit gives another visualization of the performance hit.

 

Summary

Windows 2008 R2, for Citrix/RDS workloads will be impacted quite highly.  The impact that I’ve been able to measure appears to be focused on registry-related activities.  Applications that store their settings/values/preferences in registry hives, whether they be the SOFTWARE/SYSTEM/HKCU hive will feel a performance impact.  Logon actions on RDS servers would be particularly impacted because group policies are largely registry related items, thus logon times will increase as it takes longer to process reads and writes.  CPU utilization is higher on both the Windows VM-level and the hypervisor level,  up to 40%.  The impact of speed on the applications and other functions is notable although more difficult to measure.  I was able to measure a ~400% degradation in performance for CPU processing for Group Policy Preferences, but perception is a real thing, so going from 100ms to 400ms may not be noticed.  However, on applications that measure response time, it was found we had a performance impact of 165%.  What took 1000ms now takes 1650ms.

At the time of this writing, I was only able to quantify the performance impact between two of the different hosts.  The Intel Xeon E5-2660 v4 and Intel Xeon E5-2680.

The Intel Xeon E5-2660 v4 has a frequency of 26% less than the older 2680.  In order to overcome this handicap, the processor must have improved at a per-clock rate higher than the 26% frequency loss.  CPUBenchMark had the two processors with a single thread CPU score of 1616 for the Intel Xeon E5-2660 v4 and 1657 for the Intel Xeon E5-2680.  This put them close but after 4 years the 2680 was marginally faster.  This has played out in our testing that the higher frequency processor is faster.  Performance degradation for the two processors came out as such:

 

CPU Performance Hit
Intel Xeon E5-2680 2.70GHz 155%
Intel Xeon E5-2660 v4 2.00GHz 170%

 

This tells that frequency of the processor is more important to mitigate the performance hit.

Keep in mind, these findings are my own.  It’s what I’ve experienced in my environment with the products and operating systems we use.  Newer operating systems are supposed to perform better, but I don’t have the ability to test that currently so I’m sharing these numbers as this is an absolute worst case type of scenario that you might come across.  Ensure you test the impact to understand how your environment will be affected!

Read More

Meltdown – Performance Impact Evaluation (Citrix XenApp 6.5)

2018-01-15
/ /
in Blog
/

Meltdown came out and it’s a vulnerability whose fix may have a performance impact.  Microsoft has stipulated that the impact will be more severe if you:

a) Are an older OS
b) Are using an older processor
c) If your application utilizes lots of context switches

Unfortunately, the environment we are operating hits all of these nails on the head.  We are using, I believe, the oldest OS that Microsoft is patching this for.  We are using older processors from 2011-2013 which do not have the PCID optimization (as reported by the SpeculationControl test script) which means performance is impacted even more.

I’m in a large environment where we have the ability to shuffle VM’s around hosts and put VM’s on specific hosts.  This will allow us to measure the impact of Meltdown in its entirety.  Our clusters are dedicated to Citrix XenApp 6.5 servers.

Looking at our cluster and all of the Citrix XenApp VM’s, we have some VM’s that are ‘application siloed’ — that is they only host a single application and some VM’s that are ‘generic’.

In order to determine the impact I looked at our cluster, summed up the total of each type of VM and then just divided by the number of hosts.  We have 3 different geographical areas that have different VM types and user loads.  I am going to examine each of these workload types across the different geographical areas and see what are the outcomes.

Since Meltdown impacts applications and workloads that have lots of context switches I used perfmon each server to record the context switches of each server.

The metrics I am interested in are the context switch values as they’ve been identified as the element that highlight the impact.  My workloads look like this:

Based on this chart, our largest impact should be Location B, followed by Location A, last with Location C.

However, the processors for each location are as follows:

Location A: Intel Xeon 2680
Location B: Intel Xeon 2650 v2
Location C: Intel Xeon 2680

The processors may play a roll as newer generation processors are supposed to fair better.

In order to test Meltdown in a side-by-side comparison of systems with and without the mitigation I took two identical hosts and populated them with an identical amount and type of servers. On one host we patched all the VM’s with the mitigation and with the other host we left the VM’s without the patches.

Using the wonderful ControlUp Console, we can compare the results in their real-time dashboard.  Unfortunately, the dashboard only gives us a “real time” view, ControlUp offers a product called “Insights” that can show the historical data, our organization has not subscribed to this product and so I’ve had to try and track the performance by exporting the ControlUp views on a interval and then manually sorting and presenting the data.  Using the Insights view would be much, much faster.

ControlUp has 3 different views I was hoping to explore.  The first view is the hosts view, this will be performance metrics pulled directly from the VMWare Host.  The second view will be the computers view, and the last will be the sessions view.  The computers and sessions view are metrics pulled directly from the Windows server itself.  However, I am unable to accurately judge the performance of Windows Server metrics because of how it measures CPU performance.

Another wonderful thing about ControlUp is we can logically group our VM’s into folders, from there ControlUp can sum the values and present it in an easily digestible presentation.  I created a logical structure like so and populated my VM’s:

 

And then within ControlUp we can “focus” on each “Location” folder and if we select the “Folder” view it presents the sums of the logical view.

HOSTS

In the hosts view, we can very quickly we can see impact, ranging from 5%-26%.  However, this is a realtime snapshot, I tracked the average view and examined only the “business hours” as our load is VERY focused on the 8AM-4PM.  After these hours and we see a significant drop in our load.  If the servers are not being stressed the performance seems to be a lot more even or not noticeable (in a cumulative sense).

FOLDERS

Some interesting results.  We are consistently seeing longer login times and application launch times.  2/3 of the environments have lower user counts on the unpatched servers with the Citrix load balancing making that determination.  The one environment that had more users on the mitigation servers are actually our least loaded servers in terms of servers per host and users per server, so it’s possible that more users would drive into a gap, but as of now it shows that one of our environment can support an equal number of users.

Examining this view and the historical data presented I encountered oddities — namely the CPU utilization seemed to be fairly even more often than not, but the hosts view showed greater separation between the machines with mitigation and without.  I started to explore why and believe I may have come across this issue previously.

2008R2-era servers have less accurate reporting of CPU utilization.

I believe this is also the API that ControlUp is using with these servers to report on usage.  When I was examining a single server with process explorer I noticed a *minimum* CPU utilization of 6%, but task manager and ControlUp would report 0% utilization at various points.  The issue is an accuracy, adding and rounding issue.  The more users on a server with more processes and those processes consuming ever so slightly CPU, the more the inaccuracy.  Example:

Left, Task Manager. Right, Process Explorer

We have servers with hundreds of users utilizing a workflow like this where they are using just a fraction of a percent of CPU resources.  Taskmanager and the like will not catch these values and round *down*.  If you have 100 users using a process that consumes 0.4% CPU then our inaccuracy is in the 40% scale!  So relying on the VM metrics of ControlUp or Windows itself is not helpful.  Unfortunately, this destroys my ability to capture information within the VM, requiring us to solely rely on the information within VMWare.  To be clear, I do NOT believe Windows 2012 R2 and greater OS’s have this discrepancy (although I have not tested) so this issue manifests itself pretty viciously in the XenApp 6.5 -era servers.  Essentially, if Meltdown is increasing CPU times on processes by a fraction of a percent then Windows will report as if everything is ok and you will probably not actually notice or think there is an issue!

In order to try and determine if this impact is detectable I have two servers with the same base image, with one having the mitigation installed and other did not.  I used process explorer and “saved” the process list over the course of a few hours.  I ensured the servers had a similar amount of users using a specific application that only presented a specific workload so everything was as similar as possible.  In addition, I looked at processes that aren’t configurable (or since the servers have the same base image they are configured identically).  Here were my results:

Just eye balling it, it appears that the mitigation has had an impact on the fractional level.  When taking the average of the Winlogon.exe and iexplore.exe processes into account:

 

These numbers may seem small, but once you start considering the number of users the amount wasted grows dramatically.  For 100 users, winlogon.exe goes from consuming a total of 1.6% to 7.1% of the CPU resulting in an additional load of 5.5%.  The iexplore.exe is even more egregious as it spawns 2 processes per user, and these averages are per process.  For 100 users, 200 iexplore.exe processes will be spawned.  The iexplore.exe CPU utilization goes from 15.6% to 38.8%, for an additional load of 23.2%.  Adding the mitigation patch can impact our load pretty dramatically, even though it may be under reported thus impacting users on a far greater scale by adding more users to servers that don’t have the resources that Windows is reporting it has For an application like IE, this may just mean greater slowness — depending on the workload/workflow — but if you have an application more sensitive to these performance scenarios your users may experience slowness even though the servers themselves look OK from most (all?) reporting tools.

Continuing with the HOSTS view, I exported all data ControlUp collects on a minute interval and then added the data to Excel and created my pivot tables with the hosts that are hosting servers with the mitigation patches and the ones without.  This is what I saw for Saturday-Sunday, these days are lightly loaded.

This is Location B, the host with VM’s that are unpatched is in orange and the host with patched VM’s is in blue.  the numbers are pretty identical when CPU utilization on the host is around or below 10%, but once it starts to get loaded the separation becomes apparent.

Since these datapoints were every minute, I used a moving average of 20 data points (3 per hour) to present the data in a cleaner way:

Looking at the data for this Monday morning, we see the following:

Location B

 

Some interesting events, at 2:00AM the VM’s reboot.  We reboot odd and even servers each day, and in my organization of testing this, I put all the odd VM’s on the blue host, and the even VM’s on the orange host.  So the blue line going up at 2:00AM is the odd (patched) VM’s rebooting.  The reboot cycle is staggered to take place over a 90 minute interval (last VM’s should reboot around 3:30AM).  After the reboot, the servers come up and do some “pre-user” startup work like loading AppV packages, App-V registry prestaging, etc.  I track the App-V registry pre-staging duration during bootup and here are my results:

Registry Pre-staging in AppV is a light-Read heavy-Write exercise.  Registry reading and writing are slow in 2008R2 and our time to execute this task went from 610 seconds to 693 seconds for an overall duration increase of 14%.

Looking at Location A and C

Location A

Location C (under construction)

We can see in Location A the CPU load is pretty similar until the 20% mark, then separation starts to ramp up fairly drastically.  For Location C, unfortunately, we are undergoing maintenance on the ‘patched’ VM’s, so I’m showing this data for transparency but it’s only relevant up to the 14th.  I’ll update this in the next few days when the ‘patched’ VM’s come back online.

Now, I’m going to look at how “Windows” is reporting CPU performance vs the Hosts CPU utilization.

Location A

 

Location B

 

The information this is conveying is to NOT TRUST the Windows CPU utilization meter (at least 2008 R2).  The CPU Utilization on the VM-level does not appear to reflect the load on the hosts.  While the VM’s with the patch and without the patch both report nearly identical levels of CPU utilization, on the host level the spread is much more dramatic.

 

Lastly, I am able to pull some other metrics that ControlUp tracks.  Namely, Logon Duration and Application Launch duration.  For each of the locations I got a report of the difference between the two environments

Location A: Average Application Load Time

Location B: Average Application Load Time

 

Location A: Logon Duration

 

Location B: Logon Duration

 

 

 

In each of the metrics recorded we experience a worsening experience for our user base, from the application taking longer to launch, to logon times increasing.

What does this all mean?

In the end, Meltdown has a significant impact on our Citrix XenApp 6.5 environment.  The perfect storm of older CPU’s, an older OS and applications that have workflows that are impacted by the patch means our environment is grossly impacted.  Location A has a maximum hit (as of today) of 21%.  Location B having a spread of 12%.  I had originally predicted that Location B would have the largest impact, however the newer V2 processors may be playing a roll and the performance of the V2 processors maybe more efficient than the older 2680.

In the end, the performance hit is not insignificant and reduces our capacity significantly once these patches are deployed.  I plan on adding new articles once I have more data on Meltdown and then further again once we start adding the mitigation’s against Spectre.

CPU Utilization on the hosts. Orange is a host with VM’s without the Meltdown patches, blue is with the patches.

 

Read More

Citrix XenDesktop/XenApp 7.15 – The local host cache in action

2017-11-29
/ /
in Blog
/

The Citrix Local Host Cache feature, introduced in XenDesktop/XenApp 7.12, has some nuances that maybe better demonstrated in realtime then typed out in text.  I will do both in this article to share both a ‘step by step’ of what happens when you have a network or site database outage and what occurs as well as a realtime video highlighting the feature in action.  There are many other blogs and articles that do a great job going into the step by step details of the feature but I find seeing it in action to be very informative.

To view a video of this process, scroll to the very end, or click here.

To start, I’ve created a powershell script that simulates a user querying the broker for a list of applicaitons.

Columns are time of the response, the payload size received (in bytes) and the total time to respond in milliseconds.

As we’re querying the broker, the broker is reaching out to the database and then responding to the user with the information requested.

 

Periodically, the Citrix Config Synchronizer Service will check to ensure the local host cache database is in sync with the site database. This is an event that occurs every 2 minutes during normal operation.

To show the network connection failing, I am going to setup a continuous ping to the database server

To simulate a network failure, I’m going to use the tool clumsy to drop all packets to and from the database server.

Clicking start in clumsy immediately stops the simulated user from getting their list of applications.

 

And the ping’s now time out in their requests.

The broker has a 20 second time out that after which it will respond to requests with what it thinks is the current status. The first timed out request receives a response of “working” and then thereafter a response of “pending failed” will be returned

Around 24 seconds the broker has noticed the database has failed and has logged it’s first event, 1201, “The connection between the Citrix Broker Service and the database has been lost”

Now one-minute thirty three seconds into the failure, other Citrix services are now reporting they cannot contact the database.

Just shy of 2 minutes, the broker service has now exceeded it’s timeout for contacting the database and is in the process of switching to the local host cache. It stops the “primary broker”.

And then the Citrix High Availability Service comes active, brokering user requests.

In my simulation the amount of time it took the user to receive a response from the LHC is a little faster than the site database. The LHC response time is 80-90 milliseconds where the response time for a request that includes the site database is 90-100. This information allows us to visually see the two different modes of operation in action.

Top, site database response times – middle is the outage – bottom is LHC response times

How long does it take to “fall back” to the database when connectivity is restored?

I “Stopped” clumsy to restore our network connection and started a timer.

 

We can see the ping responses from the database immediately to verify our connection is back.

 

Almost immediately, all services have noticed that they have connectivity again, including the broker service.

However, we do not fall back immediately.

At one minute thirty three seconds the broker has switched back to the primary broker. And all services have been restored.

To watch a video of this all in action, please view here:

 

 

Read More

Citrix Storefront – Adventures in customization – Define a custom resolution for a specific application

2017-11-14
/ /
in Blog
/

Currently, Storefront does not grant the ability to define applications with specific resolutions.  In order to configure the resolution, Citrix recommends you modify the default.ica file.  This is terrible!  If you had specific applications that required specific resolutions, what are you to do?  Direct users to a variety of stores depending on the resolution required?!

Fortunately, again, we can extend StoreFront to make it so we can configure custom resolutions for different applications on the same store.  The solution is a Storefront extension I’ve already written.

The steps to set this up:

  1. Download the Storefront_CustomizationLaunch.dll.
  2. Copy the file to C:\inetpub\wwwroot\Citrix\Store\bin
  3. Edit the web.config in the Store directory and enable the extension
  4. We need to enable Header pass-through for DesiredHRES, DesiredVRES, and TWIMode in the “C:\inetpub\wwwroot\Citrix\StoreWeb\web.config” file:
  5. Lastly, add the following to the custom.js file in your StoreWeb/custom folder:
  6. And enjoy the results!  🙂

Read More

Citrix Storefront – Adventures in customization – Dynamically configure workspace control based on group membership

2017-05-07
/ /
in Blog
/

I’ve been exploring the customization capabilities of Citrix Storefront and have some exciting ideas on simplifying our deployment.  What I’d really like is to reduce our store count down to a as few stores as possible.  In our Web Interface we have multiple stores based on the non-configurable settings.  They are:

  • Workspace Control Enabled with Explicit Logon
  • Workspace Control Disabled with Explicit Logon
  • Workspace Control Enabled with Domain Passthrough authentication
  • Workspace Control Disabled with Domain Passthrough authentication
  • Anonymous site

We can’t mix and match authenticated sites and anonymous sites (right?… ?) but Citrix does offer the ability to configure Authentication methods AND Workspace Control options via their ‘Receiver Extension API’s’.

These are the API’s in question:

There isn’t really a whole lot of documentation on them and how to use them.  Richard Hayton has created the Citrix Customization Cookbook which details some examples of some of the API’s.  He has several blog articles on the Citrix website that have varying degrees of applicability.  Unfortunately, he hasn’t blogged in over a year on this topic as it feels like the situation has changed a bit with the Citrix Store API’s available as well (note: these are different!).

My target is to make it so these options can be set dynamically based on the users group membership.  If you’re a member of the group ‘workspaceControlEnabled’ you get all the settings set to true, if you’re a member of ‘workspaceControlDisabled’ you get all the settings set to false.

Seems like a pretty straightforward goal?

So I thought I’d start with something pretty simple.  I have a store with Workspace Control Enabled, with show ‘Connect and Disconnect’ buttons selected:

If I log into the site:

I see everything (as I should).

So let’s start with a simple customization.  Let’s try using the API to disable these options.  I created a totally blank script.js file and added the following lines:

Now what does our menu look like?

Awesome!  Workspace Control was disabled by script!

So I said I want to disable Workspace Control if you are a member of a specific group.  Richard Hayton actually wrote a pretty good article on creating a service to facilitate grabbing your group membership with Storefront.  Unfortunately, the download link is dead.  So I wrote another PowerShell HTTP listener that takes an input and queries AD for that user and their membership, and returns a positive value if workspace control should be enabled, or a negative value if it should be disabled.  To get it to query though, it needs a value.  The best value for this, I thought, would be the user name.

Citrix provides a function to get the username and I can then pass it to my webservice, test for group membership and return whether Workspace Control (WSC) should be enabled or disabled.

I wrote my first bit of code and it crashed immediately.  I simply entered it straight into my script.js.

In order to trace the error, you simply enter “#-tr” to the end of your store URL:

and allow pop-ups.  A new tab will open allowing you to follow the ‘flow’ of Storefront as it executes its commands.  Mine crashed at:

“get username data:”

And it makes sense why it crashed there.  I haven’t even logged in yet so it has no idea who or how to get the username so it looks like Storefront just returns the login page.  We need to call our functions after logging in.

Citrix provides the following event-based functions we can hook into:

Notifications of progress

preInitialize(callback)
postInitialize(callback)
postConfigurationLoaded()
postAppListLoaded()

Note that during these calls, the UI may be hidden in native Receivers, so it is not safe to show UI
For APIs passing a callback, you MUST call the callback function, though you may delay calling it until code of your own has run.

beforeLogon(callback)

Web browsers only. Called prior to displaying any logon dialogs. You may call ‘showMessage’ here, or add your own UI.

beforeDisplayHomeScreen(callback)

All clients, called prior to displaying the main UI. This is the ideal place to add custom startup UI.
Note that for native clients, the user may not have logged in at this stage, as some clients allow offline access to the UI.

afterDisplayHomeScreen()

All clients, called once the UI is loaded and displayed. The ideal place to call APIs to adjust the initial UI, for example to start in a different tab.

So the question becomes, when does each of these get called?  We are only interested in the ones after you login.  To determine this, I hooked into each one and just did a simple trace command and then refreshed my browser to the login screen.

This is where I stopped:

And trace tab results:

These 4 stages are called before the user logs on so they are of no use to me:

preInitialize
postInitialize
postConfigurationLoaded
beforeLogon

The order of the other 3 after logon:

beforeDisplayHomeScreen
postAppListLoaded
afterDisplayHomeScreenStage

Before adding my code to the post-logon event functions I just added it back -plain jane- and rerun with a trace:

What I found is these extensions appear to have a fixed entry order point.  The workspace control extensions cannot be called after “beforeDisplayHomeScreen” stage.  If you do not call the workspace control extensions before the callback on the ‘beforeDisplayHomeScreen’ function you will be unable to control the setting.  The trace log in my screenshot for these extensions will always occur at this point in time regardless if you actually set it in ‘preInitialize, postInitialize, postConfigurationLoaded, or beforeLogon’.  And if you attempt to set it in either of the two later functions it will not log anything and your code has no effect.  So the only point in time where I can take the username and set these values are in the event function beforeDisplayHomeScreen.

<Digress>

During the course of my testing this feature I had thought about adding a button that would allow you to toggle this feature ‘enabled or disabled’ on your own whim.  But it appears once you call the Extension it’s a one and done.  I also discovered that it appears you must set the workspace control feature early in the process.  If I set it in postAppListLoaded or afterDisplayHomeScreen nothing happened.  To be fair, I do not know how to reinitialize the menu, maybe that would allow it to kick in dynamically…?  I guess that’s for further exploration on another day…

</Digress>

Ok, so we’ve found the one and only functional place we can execute our code.  So I added it.

 

The result?  Nothing.  Nothing happened.

Well, that’s not entirely true.

I’ve highlighted in yellow/orange my “getUsername” function.  We can see on line 35 we get into the function.  And on line 67 it is successfully finding and returning my name.  But the problem is that it’s getting that information after the point in time that we can set the WSC features (highlighted in blue — line 60-62).

I found that using ajax for this command and attempting to use async was causing my failure.  I understand it’s bad practice to do synchronous commands, especially in javascript as they will lock the UI while executing, but, thus far it’s the only way I know to ensure it gets completed in the proper order.  I am really not a web developer so I don’t know what’s the proper technique here to send a couple ajax requests that only blocks at the specific point in time that WSC kicks in…  Or find a way to redraw the menu?  But for the purposes of getting this working, this is the solution I’ve chosen to go with.  I’m wide open to better suggestions.  The real big extension that would be an issue is the ‘webReconnectAtStartup’.  This feature will reconnect any existing sessions you have and the way that Citrix currently implements it, they want it run as soon as the UI is displayed.  This makes some sense as that’s the whole point.  You don’t want to wait around after logging in some indeterminate or random amount of seconds for your session to reconnect…  But, this issue can be alleviated.  Citrix actually offers a way to implement this feature yourself via the Store API so we could implement our own custom version of this function that would get all your sessions and reconnect them…

Which could just leave building the menu as something that could be moved back to async if I can figure out how to rebuild it or build it dynamically…

Anyway, that maybe for another day.  For today, the following works for my purpose.

This is my custom/script.js file that I finished from this blog post:

 

Here is the LDAP_HttpListener.psm1

Lastly, the scheduled task to call the listener:

 

 

 

 

Read More

Citrix Storefront – Pass URI parameters to an application

2017-05-01
/ /
in Blog
/

In my previous post, I was exploring taking URI parameters and passing them to an application.

The main issue we are facing is that Storefront launches the ica file via an iframe src.  When launching the ica via this method the iframe does a simple ‘GET’ without passing any HEADER parameters — which is the only (documented) way to pass data to Storefront.

What can I do? I think what I need to do is create my own *custom* launchica command.  Because this will be against an unauthenticated store we should be able remove the authentication portions AND any unique identifiers (eg, csrf data).  Really, we just need the two options — the application to launch and the parameter to pass into it.  I am NOT a web developer, I do not know what would be the best solution to this problem, but here is something I came up with.

My first thought is this needs to be a URL that must be queried and that URL must return a specific content-type.  I know Powershell has lots of control over specifying things like this and I have some familiarity with Powershell so I’ve chosen that as my tool of choice to solve this problem.

In order to start I need to create or find something that will get data from a URL to powershell.  Fortunately, a brilliant person by the name of Steve Lee solved this first problem for me.

What he created is a Powershell module that creates a HTTP listener than waits for a request. We can take this listener and modify it so it listens for our two variables (CTX_Application and NFuseAppCommandLine) and then returns a ICA file.  Since this is an unauthenticated URL I had to remove the authentication feature of the script and I added a function to query the real Storefront services to generate the ICA file.

So what I’m envisioning is replacing the “LaunchIca” command with my custom one.

 

This is my modification of Steve’s script:

And the command to start the HTTP listener:

Eventually, this will need to be converted to a scheduled task or a service.  When running the listener manually, it looks like this:

 

I originally planned to use the ‘WebAPI’ and create a custom StoreFront, but I really, really want to use the new Storefront UI.  In addition, I do NOT want to have to copy a file around to each Storefront server to enable this feature.  So I started to wonder if it would be possible to modify Storefront via the extensible customization API’s it provides.  This involves adding javascript to the “C:\inetpub\wwwroot\Citrix\StoreWeb\custom\script.js” and modifying the “C:\inetpub\wwwroot\Citrix\StoreWeb\custom\style.css” files.  To start, my goal is to mimic our existing functionality and UI to an extent that makes sense.

The Web Interface 5.4 version of this web launcher looked like this:

When you browse to the URL in Web Interface 5.4 the application is automatically launched.  If it doesn’t launch, “click to connect” will launch it for you manually.  This is the function and features I want.

Storefront, without any modifications, looks like this with an authenticated store:

So, I need to make a few modifications.

  1. I need to hide all applications that are NOT my target application
  2. I need to add the additional messaging “If your application does not appear within a few seconds, click to connect.” with the underlined as a URL to our launcher.
  3. I want to minimize the interface by hiding the toolbar.  Since only one application will be displayed we do not need to see “All | Categories   Search All Apps”
  4. I want to hide the ‘All Apps’ text
  5. I want to hide “Details”, we’re going to keep this UI minimal.

The beauty of Storefront, in its current incarnation, is most of this is CSS modifications.  I made the following modifications to the CSS to get my UI minimalized:

This resulted in a UI that looked like this:

So now I want to remove all apps except my targeted application that should come in a query string.

I was curious if the ‘script.js’ would recognize the URI parameter passed to Storefront.  I modified my ‘script.js’ with the following:

Going to my URL and checking the ‘Console’ in Chrome revealed:

Yes, indeed, we are getting the URI parameters!

Great!  So can we filter our application list to only display the app in the URI?

Citrix offers a bunch of ‘extensions’.  Can one of them work for our purpose?  This one sounds interesting:

Can we do a simple check that if the application does not equal “CTX_Application” to exclude it?

The function looks like this:

Did it work?

Yes!  Perfectly!  Can we append a message to the application?  Looking at Citrix’s extensions this one looks promising:

Ok.  That warning sucks.  My whole post is based on StoreFront 3.9 so I cannot guarantee these modifications will work in future versions or previous versions.  Your Mileage May Vary.

So what elements could we manipulate to add our text?

Could we add another “p class=storeapp-name” (this is just text) for our messaging?  The onAppHTMLGeneration function says it is returned when the HTML is generated for an app, so what does this look like?

I added the following to script.js:

And this was the result in the Chrome Console:

So this is returning an DomHTMLElement.  DOMHTMLElements have numerous methods to add/create/append/modify data to them.  Perfect.  Doing some research (I’m not a web developer) I found that you can modify the content of an element by this style command:

This results in the following:

We have text!

Great.

My preference is to have the text match the application name’s format.  I also wanted to test if I could add a link to the text.  So I modified my css line:

The result?

Oh man.  This is looking good.  My ‘click to connect’ link isn’t working at this point, it just goes to google, but at least I know I can add a URL and have it work!  Now I just need to generate a URL and set that to replace ‘click to connect’.

When I made my HTTPListener I purposefully made it with the following:

The reason why I had it set to share the url of the Citrix Store is the launchurl generated by Storefront is:

The full path for the URL is actually:

So the request actually starts at the storename.  And if I want this to work with a URL re-write service like Netscaler I suspect I need to keep to relative paths.  So to reach my custom ica_launcher I can just put this in my script.js file:

and then I can replace my ‘click to connect’ link with:

The result?

The url on the “click to connect” goes to my launcher!  And it works!  Excellent!

Now I have one last thing I need to get working.  If I click the ‘Notepad 2016 – PLB’ icon I get the regular Storefront ica file, so I don’t get my LongCommandLine added into it.  Can I change where it’s trying to launch from?

Citrix appears to offer one extension that may allow this:

Huh.  Well.  That’s not much documentation at all.

Fortunately, a Citrix blog post came to rescue with some more information:

Hook APIs That Allow Delays / Cancellations

doLaunch
doSubscribe
doRemove
doInstall

On each of these the customization might show a dialog, perform some checks (etc) but ultimately should call ‘action’ if (and only if) they want the operation to proceed.

This extension is taking an object (app).  What properties does this object have?  I did a simple console.log and examined the object:

Well, look that that.  There is a property called ‘launchurl’.  Can we modify this property and have it point to our custom launcher?

I modified my function as such:

The result?

A modified launchurl!!!!

Excellent!

And launching it does return the ica file from my custom ica_launcher!

Lastly, I want to autolaunch my program.  It turns out, this is pretty simple.  Just add the following the script.js file:

Beautiful.  My full script.js file looks like so:

And that’s it.  We are able to accomplish this with a Powershell script, and two customization files.  I think this has a better chance of ‘working’ across Storefront versions then the SDK attempt I did earlier, or creating my own custom Storefront front end.

Read More

Citrix StoreFront – Experiences with Storefront Customization SDK and Web API

2017-04-24
/ /
in Blog
/

Our organization has been exploring upgrading our Citrix environment from 6.5 to 7.X.  The biggest road blocks we’ve been experiencing?  Various nuanced features in Citrix XenApp 6.5 don’t work or are not supported in 7.X.

This brings me to this post and an example of the difficulties we are facing, my exploration of solutions to this problem, and our potential solution.

In Citrix Web Interface 5.1+ you can create a URI with a set of application launch parameters and those parameters would be passed to the application.  This is detailed in this Citrix article here (CTX123998).  For us, these launches occur with a Site (WI5.4 terminology) or Store (Storefront terminology) that allows anonymous (or unauthenticated) users.

By following that guide and modifying your Web Interface you can change one of your sites so that it accepts launch parameters.  You simply enter a specific URI in your browser, and the application would launch with said parameters.  An example:

http://bottheory.local/Citrix/XenApp/site/launcher.aspx?CTX_Application=Citrix.MPS.App.XenApp.Notepad&LaunchId=1263894298505&NFuse_AppCommandLine=C:\Windows\WindowsUpdate.log

This allows you to send links around to other people and they can click to automatically launch an application with the parameters specified.  If you are really unlucky, your org might document this as an official way to launch a hosted application and this actually gets coded into certain applications.  So now you may have tens of local apps utilizing this as an acceptable method to launch a hosted application.  For an organization, this may be an acceptable way to launch certain hosted applications since around ~2008 so this “feature”, unfortunately, has built up quite a bit of inertia.

We can’t let this feature break when we move to StoreFront.  We track the number of launches from these hosted applications and it’s in the hundreds/thousands per day.  This is a critical and well used feature.

So how does this work in Web Interface and what actually happens?

URI substitution and launch process

The modifications that you apply add a new ‘query’ to the URI that is picked up.  This ‘query’ is “NFuse_AppCommandLine” and the value it is equal to (“C:\Windows\WindowsUpdate.log” in my example) is passed into the ICA file.

The ICA file, when launched, will pass the parameter to a special string “%**” the is set on the command line of the published application.  This token “%**” gets replaced by the parameter specified in the ICA which then generates the launch string.  This string is executed and the result is the program launches.

Can we do this with Storefront?  Well, first things first, is this even possible with XenApp/XenDesktop 7.X?  In order to test this I created an application and specified the special token.

I then generated an ICA file and manually modified the LongCommandLine to add my parameter.  I then launched it:

 

Did it work?

Yes, it worked.

Success!  So XenApp/XenDesktop 7.X will substitute the tokens with the LongCommandLine in the ICA file.  Excellent.  So now we just need Storefront to take the URI and add it to the LongCommandLine.  Storefront does not do this out of the box.

However, Citrix offers a couple of possible solutions to this problem (that I explored).

StoreFront WebAPI
StoreFront Store Customization SDK

What are these and how do they work?

StoreFront Web API

This API is billed as:

“Write a new Web UI or integrate StoreFront into your own Web portal”

We are going to need to either modify Storefront or write our own in order to take and parameters in the URI.  I decided to write our own.  Using the ApiExample.html in the WebAPI I added the following:

This looks into the URI and allows you to get the value of either query string.  In my example I am grabbing the values for “CTX_Application” and “NFuse_AppCommandLine”.

I then removed a bunch of the authentication portions from the ApiExample.html (I’ll be using an unauthenticated store).  In order to automatically select the specified application I added some javascript to check the resource list and get the launch URL:

There is a function within the ApiExample.html file that pulls the ICA file.  Could we take this and modify it before returning it with the LongCommandLine addition?  We have the NFuse_AppCommandLine now.

It turns out, you can. You can capture the ICA file as a variable and then using javascripts ‘.replace’ method you can modify the line so that it contains your string.  But how do you pass the ICA file to the system to launch?

This is how Citrix launches the ICA file in the ApiExample.html:

It generates a URL then sets it as the src in the iframe.  The actual result looks like this:

The ICA file is returned in this line:

When we capture the ICA file as a variable the only way that I’ve found you can reference it is via a blob.  What does the src path look like when do that?

Ok, this looks great!  I can create an ICA file than modify it all through WebAPI and return the ICA file to the browser for execution.  Does it work?

Yes and no. 🙁

It works in Chrome and Firefox, but IE doesn’t auto-launch.  It prompts to ‘save’ a file.  Why?  IE doesn’t support opening ‘non-standard’ blobs.  MS offers a method called “msSaveOrOpenBlob” which you can use instead, and this method then prompts for opening the blob.  This will work for opening the ICA file but now the end user requires an extra step.  So this won’t work.  It needs to be automatic like its supposed to be for a good experience.

So WebAPI appears to offer part of the solution.  We can capture the nFuse_AppCommandLine but we need to get it to LongCommandLine.

At this point I decided to look at the StoreFront Store Customization SDK.  It states it has this ability:

Post-Launch ICA file—use this to modify the generated ICA file. For example, use this to change ICA virtual channel parameters and prevent users from accessing their clipboard.

That sounds perfect!

StoreFront Store Customization SDK

The StoreFront store customization SDK bills one of its features as:

The Store Customization SDK allows you to apply custom logic to the process of displaying resources to users and to adjust launch parameters. For example, you can use the SDK to control which apps and desktops are displayed to users, to change ICA virtual channel parameters, or to modify access conditions through XenApp and XenDesktop policy selection.

I underlined the part that is important to me.  That’s what I want, exactly!  I want to adjust launch parameters.

To start, one of the tasks that I need is to take my nFuse_AppCommandLine and get it passed to the SDK.  The only way I’ve found to make this happen is to enable ‘forwardedHeaders‘ in your web.config file.

With this, you need to set your POST/GET Header on your request to Storefront to get this parameter passed to the SDK.  Here is how I setup my SDK:

Install the “Microsoft Visual Studio Community Edition” on a test StoreFront server.

 

Download the StoreCustomizationSDK and open the ‘Customization_Launch’ project file:

 

Right-click ‘Customization_Launch’ and select ‘Properties’.

 

Modify the Outpath in Build to the “%site%\bin” location.  This is NOT the %site%Web, but just the %site%.

Open the LaunchResultModifer.cs

 

Set a breakpoint somewhere in the file.

Build > Build ‘Customization_Launch’

Ensure the build was successful:

If you get an error message:

Just ignore it.  We’re compiling a library and not an executable and that’s why you get this message.

Now we connect to the debugger to the ‘Citrix Delivery Services Resources’.  Select ‘Attach to Process…’

Select the w3wp.exe process whose user name is ‘Citrix Delivery Services Resources’.  You may need to select ‘Show processes from all users’ and click ‘ Attach.

Click ‘Attach’.

Browse to your website and click to launch an icon:

Your debugger should pause at the breakpoint:

And you can inspect the values.

At this point I wasn’t interested in trying to re-write the ApiExample.html to get this testing underway, I instead used PowerShell to submit my POST’s and GET’s.  Remember, I’m using an unauthenticated store so I could cut down on the requests sent to StoreFront to get my apps.  I found a script from Ryan Butler and made some modifications to it.  I modified it to remove the parameters since I’m doing testing via hardcoding 🙂

Executing the powershell script and examining a custom variable with breakpoint shows us that our header was successfully passed into the file:

Awesome!  So can we do something with this?  Citrix has provided a function that does a find/replace of the value you want to modify.  To enable it, we need to add it the ICAFile.cs:

Browse to ICAFile.cs

 

At this point we can use the helper methods within the IcaFile.cs.  To do so, just add “using Examples.Helpers;” to the top of the file:

I cleaned out the HDXRouting out of the file, grabbed the nFuseAppCMDLine header, and then returned the result:

The result?

Success!  We’ve used the WebAPI and the StoreFront Customization SDK to supply a header, modify the ICA file, and return it with the value needed!  The ICA file returned works perfectly!  Ok, so I’m thinking this looks pretty good right?!  We just need to get the iframe to supply a custom header when the src gets updated.

Except I don’t think it’s possible to tag a custom header when you update a src in an iframe.   So then I started thinking maybe I can use a cookie!  If I can pass a cookie into the Storefront Customization SDK I could use that instead of the header.  Unfortunately, I do not see anyway to query a custom cookie or pass a custom cookie into the SDK.  Citrix appears to have (rightfully!) locked down Storefront that this doesn’t seem possible.  No custom cookies are passed into the httpcontext (that I can see).

So all this work appears to be for naught (outside of an exercise on how one could setup and customize Storefront).

 

But what about the Storefront feature “add shortcuts to websites“?

Although doing that will allow to launch an app via a URL, it doesn’t offer any method to pass values into the URL.  So it’s a non-starter as well.

Next will be a workaround/solution that appears to work…

Read More

Citrix XenApp Enumeration Performance – IMA vs FMA – oddities

2017-04-13
/ /
in Blog
/

During the course of load testing FMA to see how it compares with IMA in terms of performance I encountered some oddities with FMA.  It definitely is more efficient at enumerating XenApp applications compared to IMA.  But…  The differences in my testing may have been overstated.

During the load testing I captured some performance counters.  One of them, “Citrix XML Service – enumerate resources – Concurrent Transactions” seemed to line up almost perfectly with the load wcat was applying to the broker.  BUT…  It capped out.  It seemed to cap at 200-220 concurrent connections.

 

The slope of the increase in the concurrent transactions should have continued but it stops.   I had set a maximum of 1000 concurrent connections via WCAT but it stops then does this weird ‘stepping’.  When this limit appears to be reached, that appears to be when the “performance” of FMA succeeds IMA significantly.

(ORANGE is FMA, BLUE is IMA).

What I’m considering is that FMA is taking all requests up to that limit and then ‘queuing’ them.  I have no proof of that outside of the Perf Counter just hitting that ceiling.  And when the concurrent connections hit ~800 and the response time soared, was the broker actually working?

No.  It was not working.  The responses I got when the connections hit 800+ looked like this:

ErrorID ‘unspecified’.  Undoubtedly, it appears there is a timer somewhere that if the broker cannot respond to requests within a reasonable amount of time (20 seconds?) that it returns this error code.

So is the amount of processing the broker can handle a hardcoded limit?  I scanned the startup of the service with Procmon to see what registry keys/values the BrokerService was looking for.

I found a document from Citrix that details *some* of these registry keys.

There is a registry value that determines the maximum number of concurrent requests the FMA broker will process:

I added that key and restarted the service.  What was the impact?

16vCPU

Very positive.  The 800 concurrent limit was easily bypassed. Why is this limited to 500 by default?  Changing this value to 1000 had a massive improvement in terms of how many concurrent connections it could handle.  It pushed itself to 1350 concurrent connections before it broke.  If I had to push for a single tweak to FMA, this one might be it.  Mind you, the Performance Counter was still capping itself around 220 concurrent connections, so I’m not sure if that’s a limit of the counter or something else but it is VERY accurate until that point.  If I start my WCAT test and go to 50 the perf counter goes to 50.  If I choose 80, it goes to 80.  If I choose 173 it goes to exactly 173.  So it is odd.

During the course of my investigation I found a registry key that shows Citrix has decided any request over 20 seconds to the broker will fail.  This key is:

I suspect this is my “unspecified” error.  20 seconds is a really long time for a broker to respond to a request though, if you were a user this would suck.  However, is it better to increase this value?  I have it on my todo list to try and increase it for my testing to see what happens, but values like the Storefront or Netscaler XML requests to see if the brokers are responding are handicapped to 20 seconds with this value.  So it’s probably important to have them match to reduce any further timeout you may encounter.  For instance, it would not make sense to set a 60 second timeout in a Netscaler XML broker test if the broker is going to timeout at 20 seconds anyways.

Next will be testing this with Local Host Cache to see if this causes any impact.

Read More

Citrix XenApp Enumeration Performance – IMA vs FMA load testing

2017-04-12
/ /
in Blog
/

In the previous post we found IMA outperformed FMA in terms of an individual request for resources.  However, the 30ms difference would be imperceptible for a end user.  This post will focus on maximum load for an individual broker.  In the past we discovered that the IMA service is heavily dependent on the number of CPU’s.  The more CPU’s allocated to the machine hosting the IMA service, the more requests could be handled (supposedly to a max of 16 — the maxmium number for threads for IMA).

What I have configured for my testing is a XenApp 6.5 machine with 2 sockets and 1 vCPU’s for a total of 2 cores.  I have an identical machine configured for XenApp 7.13.

Doing my query directly to the 7.13 broker, I’ve discovered my request appears to be tied to the performance counter “Citrix XML Service – Transactions/sec – enumerate resources”.  I can visibly see how many requests are hitting the broker through my testing, although the numbers vs what I’m sending don’t necessarily line up.

For XenApp 6.5 the performance counter that appears to tie directly to my requests is the “Citrix MetaFrame Presentation Server – Filtered Application Enumerations/sec”.

Using WCAT to apply a load to XenApp 6.5 and 7.13 and measuring the per second results I should be able to see which is more efficient over a larger load.

My WCAT options for the FMA server will be:

and for IMA:

 

My “scenario” file (XML-Direct.ubr) looks like so:

The new FMA service has an additional counter called “Citrix XML Service – enumerate resources – Avg. Transaction Time” which actually reports back how long it took to execute the enumeration.

Just like my previous testing I’m going to test with 2, 4, 8 and now with 16 CPU.  Both machines are VM’s on the same cluster with the same specs for their host (Intel Xeon E5-2680 @ 2.70GHz).  A difference between the two VM’s is the XenDesktop 7.13 is a Server 2016 Standard platform.

2vCPU

IMA is in BLUE and FMA is in ORANGE.

Interesting results.  It appears XenDesktop FMA broker can handle higher loads with much greater efficiency compared to IMA.  And it does not appear to take much for FMA to flex some muscle and show that it can handle the same load faster than IMA.  FMA kept under our 5000ms target up to 180 concurrent connections where as IMA breaks that around 140 connections.  This initial data shows that FMA could handle approximately 23% more load.  Fairly impressive.

4vCPU

Again, FMA continues its dominance in enumeration performance.  The gap between them grows slightly to 26% at 400 concurrent connections.  At our target of under 5000ms, IMA breaks at 250 connections, FMA actually held until around 380 connections.

8vCPU

Again, FMA handles the additional load with the additional CPU’s without breaking a sweat.  It seemed to be unstoppable, keeping under 5000ms until about 655 connections.  IMA, on the other hand, exceeded 5000ms response time at 400 connections.  But I had something strange happen that I assumed to be my fault.  The FMA broker service suddenly spiked at 800 concurrent connections (this graph only shows up to 780) and its response time zoomed to 30-40 seconds.  I assumed this was a fault of mine and continued on, trying 16 CPU’s.

16vCPU

But this graph shows it pretty readily.  As soon as you cross 800 concurrent connections FMA pukes.  I didn’t scale the graph to show that spike, but it goes up to 40 seconds.  So there appears to be a pretty hard limit of <800 concurrent connections (600 would probably be a pretty safe buffer…).  If you exceed that limit your performance is going to tank, HARD.  IMA, however, pushes on.  With 16vCPU IMA didn’t break the 5000ms target until 570 concurrent connections.  FMA appeared to be handling it just fine until it exploded.

For this testing the FMA broker is set in ‘connection leasing’ mode.  Perhaps this is related to that?  My next test will be to set the broker to Local Host Cache mode, retest, and then simulate a DB failure and test the LHC and see how quickly it can respond.

Summary:

The FMA broker, when acting in a purely XenApp fashion works pretty well.  It handles loads faster than IMA, but apparently this is only true to an extreme rate.  You should not have more than 800 concurrent connections per broker.  <period>.   You should probably keep a target maximum of 600 to be safe.  And assign at least 4, but preferably 8 CPU’s if you can to your broker servers.  This appears to be the sweet spot.

The highest rate our organization sees is 600 concurrent connections but we spread this lead across 9 IMA brokers, and divide that geographically.  The highest consecutive load we’ve measured with this on any single broker is ~150 concurrent connections during a peak load period.  We target a response time of less than 5 seconds for any broker enumeration and it does appear we could handle this quite easily with FMA.  However, this does not take into account XenDesktop traffic which is ‘heavier’ than XenApp for the FMA.  When we get to a point that I can test XenDesktop load I will do so.  Until then, I am impressed with FMA’s XenApp enumeration performance (until breakage anyways).

Next up…  Oddities encountered in testing

Read More
Page 1 of 512345