XenApp

Citrix StoreFront – Experiences with Storefront Customization SDK and Web API

2017-04-24
/ /
in Blog
/

Our organization has been exploring upgrading our Citrix environment from 6.5 to 7.X.  The biggest road blocks we’ve been experiencing?  Various nuanced features in Citrix XenApp 6.5 don’t work or are not supported in 7.X.

This brings me to this post and an example of the difficulties we are facing, my exploration of solutions to this problem, and our potential solution.

In Citrix Web Interface 5.1+ you can create a URI with a set of application launch parameters and those parameters would be passed to the application.  This is detailed in this Citrix article here (CTX123998).  For us, these launches occur with a Site (WI5.4 terminology) or Store (Storefront terminology) that allows anonymous (or unauthenticated) users.

By following that guide and modifying your Web Interface you can change one of your sites so that it accepts launch parameters.  You simply enter a specific URI in your browser, and the application would launch with said parameters.  An example:

http://bottheory.local/Citrix/XenApp/site/launcher.aspx?CTX_Application=Citrix.MPS.App.XenApp.Notepad&LaunchId=1263894298505&NFuse_AppCommandLine=C:\Windows\WindowsUpdate.log

This allows you to send links around to other people and they can click to automatically launch an application with the parameters specified.  If you are really unlucky, your org might document this as an official way to launch a hosted application and this actually gets coded into certain applications.  So now you may have tens of local apps utilizing this as an acceptable method to launch a hosted application.  For an organization, this may be an acceptable way to launch certain hosted applications since around ~2008 so this “feature”, unfortunately, has built up quite a bit of inertia.

We can’t let this feature break when we move to StoreFront.  We track the number of launches from these hosted applications and it’s in the hundreds/thousands per day.  This is a critical and well used feature.

So how does this work in Web Interface and what actually happens?

URI substitution and launch process

The modifications that you apply add a new ‘query’ to the URI that is picked up.  This ‘query’ is “NFuse_AppCommandLine” and the value it is equal to (“C:\Windows\WindowsUpdate.log” in my example) is passed into the ICA file.

The ICA file, when launched, will pass the parameter to a special string “%**” the is set on the command line of the published application.  This token “%**” gets replaced by the parameter specified in the ICA which then generates the launch string.  This string is executed and the result is the program launches.

Can we do this with Storefront?  Well, first things first, is this even possible with XenApp/XenDesktop 7.X?  In order to test this I created an application and specified the special token.

I then generated an ICA file and manually modified the LongCommandLine to add my parameter.  I then launched it:

 

Did it work?

Yes, it worked.

Success!  So XenApp/XenDesktop 7.X will substitute the tokens with the LongCommandLine in the ICA file.  Excellent.  So now we just need Storefront to take the URI and add it to the LongCommandLine.  Storefront does not do this out of the box.

However, Citrix offers a couple of possible solutions to this problem (that I explored).

StoreFront WebAPI
StoreFront Store Customization SDK

What are these and how do they work?

StoreFront Web API

This API is billed as:

“Write a new Web UI or integrate StoreFront into your own Web portal”

We are going to need to either modify Storefront or write our own in order to take and parameters in the URI.  I decided to write our own.  Using the ApiExample.html in the WebAPI I added the following:

This looks into the URI and allows you to get the value of either query string.  In my example I am grabbing the values for “CTX_Application” and “NFuse_AppCommandLine”.

I then removed a bunch of the authentication portions from the ApiExample.html (I’ll be using an unauthenticated store).  In order to automatically select the specified application I added some javascript to check the resource list and get the launch URL:

There is a function within the ApiExample.html file that pulls the ICA file.  Could we take this and modify it before returning it with the LongCommandLine addition?  We have the NFuse_AppCommandLine now.

It turns out, you can. You can capture the ICA file as a variable and then using javascripts ‘.replace’ method you can modify the line so that it contains your string.  But how do you pass the ICA file to the system to launch?

This is how Citrix launches the ICA file in the ApiExample.html:

It generates a URL then sets it as the src in the iframe.  The actual result looks like this:

The ICA file is returned in this line:

When we capture the ICA file as a variable the only way that I’ve found you can reference it is via a blob.  What does the src path look like when do that?

Ok, this looks great!  I can create an ICA file than modify it all through WebAPI and return the ICA file to the browser for execution.  Does it work?

Yes and no. 🙁

It works in Chrome and Firefox, but IE doesn’t auto-launch.  It prompts to ‘save’ a file.  Why?  IE doesn’t support opening ‘non-standard’ blobs.  MS offers a method called “msSaveOrOpenBlob” which you can use instead, and this method then prompts for opening the blob.  This will work for opening the ICA file but now the end user requires an extra step.  So this won’t work.  It needs to be automatic like its supposed to be for a good experience.

So WebAPI appears to offer part of the solution.  We can capture the nFuse_AppCommandLine but we need to get it to LongCommandLine.

At this point I decided to look at the StoreFront Store Customization SDK.  It states it has this ability:

Post-Launch ICA file—use this to modify the generated ICA file. For example, use this to change ICA virtual channel parameters and prevent users from accessing their clipboard.

That sounds perfect!

StoreFront Store Customization SDK

The StoreFront store customization SDK bills one of its features as:

The Store Customization SDK allows you to apply custom logic to the process of displaying resources to users and to adjust launch parameters. For example, you can use the SDK to control which apps and desktops are displayed to users, to change ICA virtual channel parameters, or to modify access conditions through XenApp and XenDesktop policy selection.

I underlined the part that is important to me.  That’s what I want, exactly!  I want to adjust launch parameters.

To start, one of the tasks that I need is to take my nFuse_AppCommandLine and get it passed to the SDK.  The only way I’ve found to make this happen is to enable ‘forwardedHeaders‘ in your web.config file.

With this, you need to set your POST/GET Header on your request to Storefront to get this parameter passed to the SDK.  Here is how I setup my SDK:

Install the “Microsoft Visual Studio Community Edition” on a test StoreFront server.

 

Download the StoreCustomizationSDK and open the ‘Customization_Launch’ project file:

 

Right-click ‘Customization_Launch’ and select ‘Properties’.

 

Modify the Outpath in Build to the “%site%\bin” location.  This is NOT the %site%Web, but just the %site%.

Open the LaunchResultModifer.cs

 

Set a breakpoint somewhere in the file.

Build > Build ‘Customization_Launch’

Ensure the build was successful:

If you get an error message:

Just ignore it.  We’re compiling a library and not an executable and that’s why you get this message.

Now we connect to the debugger to the ‘Citrix Delivery Services Resources’.  Select ‘Attach to Process…’

Select the w3wp.exe process whose user name is ‘Citrix Delivery Services Resources’.  You may need to select ‘Show processes from all users’ and click ‘ Attach.

Click ‘Attach’.

Browse to your website and click to launch an icon:

Your debugger should pause at the breakpoint:

And you can inspect the values.

At this point I wasn’t interested in trying to re-write the ApiExample.html to get this testing underway, I instead used PowerShell to submit my POST’s and GET’s.  Remember, I’m using an unauthenticated store so I could cut down on the requests sent to StoreFront to get my apps.  I found a script from Ryan Butler and made some modifications to it.  I modified it to remove the parameters since I’m doing testing via hardcoding 🙂

Executing the powershell script and examining a custom variable with breakpoint shows us that our header was successfully passed into the file:

Awesome!  So can we do something with this?  Citrix has provided a function that does a find/replace of the value you want to modify.  To enable it, we need to add it the ICAFile.cs:

Browse to ICAFile.cs

 

At this point we can use the helper methods within the IcaFile.cs.  To do so, just add “using Examples.Helpers;” to the top of the file:

I cleaned out the HDXRouting out of the file, grabbed the nFuseAppCMDLine header, and then returned the result:

The result?

Success!  We’ve used the WebAPI and the StoreFront Customization SDK to supply a header, modify the ICA file, and return it with the value needed!  The ICA file returned works perfectly!  Ok, so I’m thinking this looks pretty good right?!  We just need to get the iframe to supply a custom header when the src gets updated.

Except I don’t think it’s possible to tag a custom header when you update a src in an iframe.   So then I started thinking maybe I can use a cookie!  If I can pass a cookie into the Storefront Customization SDK I could use that instead of the header.  Unfortunately, I do not see anyway to query a custom cookie or pass a custom cookie into the SDK.  Citrix appears to have (rightfully!) locked down Storefront that this doesn’t seem possible.  No custom cookies are passed into the httpcontext (that I can see).

So all this work appears to be for naught (outside of an exercise on how one could setup and customize Storefront).

 

But what about the Storefront feature “add shortcuts to websites“?

Although doing that will allow to launch an app via a URL, it doesn’t offer any method to pass values into the URL.  So it’s a non-starter as well.

Next will be a workaround/solution that appears to work…

Read More

Citrix XenApp Enumeration Performance – IMA vs FMA – oddities

2017-04-13
/ /
in Blog
/

During the course of load testing FMA to see how it compares with IMA in terms of performance I encountered some oddities with FMA.  It definitely is more efficient at enumerating XenApp applications compared to IMA.  But…  The differences in my testing may have been overstated.

During the load testing I captured some performance counters.  One of them, “Citrix XML Service – enumerate resources – Concurrent Transactions” seemed to line up almost perfectly with the load wcat was applying to the broker.  BUT…  It capped out.  It seemed to cap at 200-220 concurrent connections.

 

The slope of the increase in the concurrent transactions should have continued but it stops.   I had set a maximum of 1000 concurrent connections via WCAT but it stops then does this weird ‘stepping’.  When this limit appears to be reached, that appears to be when the “performance” of FMA succeeds IMA significantly.

(ORANGE is FMA, BLUE is IMA).

What I’m considering is that FMA is taking all requests up to that limit and then ‘queuing’ them.  I have no proof of that outside of the Perf Counter just hitting that ceiling.  And when the concurrent connections hit ~800 and the response time soared, was the broker actually working?

No.  It was not working.  The responses I got when the connections hit 800+ looked like this:

ErrorID ‘unspecified’.  Undoubtedly, it appears there is a timer somewhere that if the broker cannot respond to requests within a reasonable amount of time (20 seconds?) that it returns this error code.

So is the amount of processing the broker can handle a hardcoded limit?  I scanned the startup of the service with Procmon to see what registry keys/values the BrokerService was looking for.

I found a document from Citrix that details *some* of these registry keys.

There is a registry value that determines the maximum number of concurrent requests the FMA broker will process:

I added that key and restarted the service.  What was the impact?

16vCPU

Very positive.  The 800 concurrent limit was easily bypassed. Why is this limited to 500 by default?  Changing this value to 1000 had a massive improvement in terms of how many concurrent connections it could handle.  It pushed itself to 1350 concurrent connections before it broke.  If I had to push for a single tweak to FMA, this one might be it.  Mind you, the Performance Counter was still capping itself around 220 concurrent connections, so I’m not sure if that’s a limit of the counter or something else but it is VERY accurate until that point.  If I start my WCAT test and go to 50 the perf counter goes to 50.  If I choose 80, it goes to 80.  If I choose 173 it goes to exactly 173.  So it is odd.

During the course of my investigation I found a registry key that shows Citrix has decided any request over 20 seconds to the broker will fail.  This key is:

I suspect this is my “unspecified” error.  20 seconds is a really long time for a broker to respond to a request though, if you were a user this would suck.  However, is it better to increase this value?  I have it on my todo list to try and increase it for my testing to see what happens, but values like the Storefront or Netscaler XML requests to see if the brokers are responding are handicapped to 20 seconds with this value.  So it’s probably important to have them match to reduce any further timeout you may encounter.  For instance, it would not make sense to set a 60 second timeout in a Netscaler XML broker test if the broker is going to timeout at 20 seconds anyways.

Next will be testing this with Local Host Cache to see if this causes any impact.

Read More

Citrix XenApp Enumeration Performance – IMA vs FMA load testing

2017-04-12
/ /
in Blog
/

In the previous post we found IMA outperformed FMA in terms of an individual request for resources.  However, the 30ms difference would be imperceptible for a end user.  This post will focus on maximum load for an individual broker.  In the past we discovered that the IMA service is heavily dependent on the number of CPU’s.  The more CPU’s allocated to the machine hosting the IMA service, the more requests could be handled (supposedly to a max of 16 — the maxmium number for threads for IMA).

What I have configured for my testing is a XenApp 6.5 machine with 2 sockets and 1 vCPU’s for a total of 2 cores.  I have an identical machine configured for XenApp 7.13.

Doing my query directly to the 7.13 broker, I’ve discovered my request appears to be tied to the performance counter “Citrix XML Service – Transactions/sec – enumerate resources”.  I can visibly see how many requests are hitting the broker through my testing, although the numbers vs what I’m sending don’t necessarily line up.

For XenApp 6.5 the performance counter that appears to tie directly to my requests is the “Citrix MetaFrame Presentation Server – Filtered Application Enumerations/sec”.

Using WCAT to apply a load to XenApp 6.5 and 7.13 and measuring the per second results I should be able to see which is more efficient over a larger load.

My WCAT options for the FMA server will be:

and for IMA:

 

My “scenario” file (XML-Direct.ubr) looks like so:

The new FMA service has an additional counter called “Citrix XML Service – enumerate resources – Avg. Transaction Time” which actually reports back how long it took to execute the enumeration.

Just like my previous testing I’m going to test with 2, 4, 8 and now with 16 CPU.  Both machines are VM’s on the same cluster with the same specs for their host (Intel Xeon E5-2680 @ 2.70GHz).  A difference between the two VM’s is the XenDesktop 7.13 is a Server 2016 Standard platform.

2vCPU

IMA is in BLUE and FMA is in ORANGE.

Interesting results.  It appears XenDesktop FMA broker can handle higher loads with much greater efficiency compared to IMA.  And it does not appear to take much for FMA to flex some muscle and show that it can handle the same load faster than IMA.  FMA kept under our 5000ms target up to 180 concurrent connections where as IMA breaks that around 140 connections.  This initial data shows that FMA could handle approximately 23% more load.  Fairly impressive.

4vCPU

Again, FMA continues its dominance in enumeration performance.  The gap between them grows slightly to 26% at 400 concurrent connections.  At our target of under 5000ms, IMA breaks at 250 connections, FMA actually held until around 380 connections.

8vCPU

Again, FMA handles the additional load with the additional CPU’s without breaking a sweat.  It seemed to be unstoppable, keeping under 5000ms until about 655 connections.  IMA, on the other hand, exceeded 5000ms response time at 400 connections.  But I had something strange happen that I assumed to be my fault.  The FMA broker service suddenly spiked at 800 concurrent connections (this graph only shows up to 780) and its response time zoomed to 30-40 seconds.  I assumed this was a fault of mine and continued on, trying 16 CPU’s.

16vCPU

But this graph shows it pretty readily.  As soon as you cross 800 concurrent connections FMA pukes.  I didn’t scale the graph to show that spike, but it goes up to 40 seconds.  So there appears to be a pretty hard limit of <800 concurrent connections (600 would probably be a pretty safe buffer…).  If you exceed that limit your performance is going to tank, HARD.  IMA, however, pushes on.  With 16vCPU IMA didn’t break the 5000ms target until 570 concurrent connections.  FMA appeared to be handling it just fine until it exploded.

For this testing the FMA broker is set in ‘connection leasing’ mode.  Perhaps this is related to that?  My next test will be to set the broker to Local Host Cache mode, retest, and then simulate a DB failure and test the LHC and see how quickly it can respond.

Summary:

The FMA broker, when acting in a purely XenApp fashion works pretty well.  It handles loads faster than IMA, but apparently this is only true to an extreme rate.  You should not have more than 800 concurrent connections per broker.  <period>.   You should probably keep a target maximum of 600 to be safe.  And assign at least 4, but preferably 8 CPU’s if you can to your broker servers.  This appears to be the sweet spot.

The highest rate our organization sees is 600 concurrent connections but we spread this lead across 9 IMA brokers, and divide that geographically.  The highest consecutive load we’ve measured with this on any single broker is ~150 concurrent connections during a peak load period.  We target a response time of less than 5 seconds for any broker enumeration and it does appear we could handle this quite easily with FMA.  However, this does not take into account XenDesktop traffic which is ‘heavier’ than XenApp for the FMA.  When we get to a point that I can test XenDesktop load I will do so.  Until then, I am impressed with FMA’s XenApp enumeration performance (until breakage anyways).

Next up…  Oddities encountered in testing

Read More

Citrix XenApp Enumeration Performance – IMA vs FMA

2017-04-11
/ /
in Blog
/

We’re exploring upgrading our Citrix XenApp 6.5 environment to 7.X (currently 7.13) and we have some architecture decisions that are driven by the performance of the infrastructure components of XenApp.  In 6.5 these components are the “Citrix Independent Management Architecture” and in 7.13 this is the “Citrix Broker Service”.  The performance I’ll be measuring is how long it takes to enumerate applications for a user.  In XenApp 6.5 this is the most intensive task put on the broker.  I’ve taken our existing XenApp 6.5 TEST environment and “migrated” it to 7.X.  The details of the environment are 189 enabled applications with various security groups applied to each application.  The user I will be testing with has access to 55 of them.  What the broker/IMA service has to do when it receives the XML request is evaluate each application to see if the user has permissions and return the results.  The ‘request’ is slightly different to the broker vs the IMA.  This is what the FMA requests will look like:

 

And the IMA requests:

In our environment, we have measured a ‘peak’ load of 600 concurrent connections per second to our XenApp 6.5 IMA service.  We split this load over 7 servers and the load is load-balanced via Netscaler VIP’s.  This lessens the peak load to 85 concurrent connections per server per second.  What’s a “connection”?  A connection is a request to the IMA service and a response from it.  This would be considered a single connection in my definition:

 

This is a single request (in RED) and response (in BLUE).  No further follow up is required by the client.

I’m going to profile a single response and request to better understand the individual performance of each product.

This is what my network traffic will look like (on the 7.X broker service):

The total time between when the FMA broker receives a single request to beginning the response is:

Initial receipt of traffic at 37.567664-37.567827.
Response starts at 37.633986

Response ends at 37.634432.

Total time for FMA request for list of applications and the response for that list:

For IMA the total time between when the IMA service receives a single request to beginning response is:

Initial receipt of traffic at 38.359944-38.360198.
Response starts at 38.440197

Response ends at 38.450032.

Total time for IMA request for list of applications and the response for that list:

Why the size difference (18KB vs 24KB)?

Looking at the data returned from the FMA via the IMA shows there is a new field passed by the FMA broker as apart of ‘AppData’

These two lines add 61 bytes per application.  A standard application response is (with title) ~331 bytes per IMA application and ~400 bytes per FMA application.

However, these single request are exactly that.  Single.

In order to get a better feel I ran the requests continuously in a loop, sending a request the FMA than the IMA, delay 1 second, and resend.  This should get me a more accurate feel for the performance differences.  I ran this over a period of 10 minutes.  My results were:

 

IMA is faster by approx 30ms per request.

Next up is load testing IMA vs FMA.

Read More

Citrix Netscaler 11.1 Unified Gateway and a non-working Citrix HTML5 Receiver

2016-12-06
/ /
in Blog
/

We setup a Citrix Unified Gateway for a proof of concept and were having an issue getting the HTML5 Receiver to connect for external connections.  It was presenting an error message: “Citrix Receiver cannot connect to the server”.  We followed this documentation.  It states, for our use case:

What would probably help is having a proxy that can parse out all websocket traffic and convert to ICA/CGP traffic without any need of changes to XA/XD. Netscaler Gateway does exactly this… …NetScaler Gateway also doesn’t need any special configuration for listening over websockets…

Connections via NetScaler Gateway:

…When a gateway is involved, the connections are similar in all Receivers. Here, Gateway acts as WebSocket proxy and in-turn opens ICA/CGP/SSL native socket connections to backend XenApp and XenDesktop. …

…So using a NetScaler Gateway would help here for ease of deployment. Connection to gateway is SSL/TLS and gateway to XenApp/XenDesktop is ICA/CGP….

And additional documentation here.

WebSocket connections are also disabled by default on NetScaler Gateway. For remote users accessing their desktops and applications through NetScaler Gateway, you must create an HTTP profile with WebSocket connections enabled and either bind this to the NetScaler Gateway virtual server or apply the profile globally. For more information about creating HTTP profiles, see HTTP Configurations.

Ok.  So we did the following:

  1. We enabled WebSocket connections on Netscaler via the HTTP Profiles
  2. We configured Storefront with HTML5 Receiver and configured it for talking to the Netscaler.

And then we tried launching our application:

We started our investigation.  The first thing we did was test to see if HTML5 Receiver works at all.  We configured and enabled websockets on our XenApp servers and then logged into the Storefront server directly, and internally.  We were able to launch applications without issue.

The second thing we did was enable logging for HTML5 receiver:

To view Citrix Receiver for HTML5 logs

To assist with troubleshooting issues, you can view Citrix Receiver for HTML5 logs generated during a session.

  1. Log on to the Citrix Receiver for Web site.
  2. In another browser tab or window, navigate to siteurl/Clients/HTML5Client/src/ViewLog.html, where siteurlis the URL of the Citrix Receiver for Web site, typically http://server.domain/Citrix/StoreWeb.
  3. On the logging page, click Start Logging.
  4. On the Citrix Receiver for Web site, access a desktop or application using Citrix Receiver for HTML5.

    The log file generated for the Citrix Receiver for HTML5 session is shown on the logging page. You can also download the log file for further analysis.

This was the log file it generated:

The “Close with code=1006” seemed to imply it was a “websocket” issue from google searches.

 

The last few events prior to the error are “websocket” doing…  something.

I proceeded to spin up a home lab with XenApp and a Netscaler configured for HTML5 Receiver and tried connecting.  It worked flawlessly via the Netscaler.  I enabled logging and took another look:

So there is a lot of differences but we focus on the point of failure in our enterprise netscaler we see it seems to retry or try different indexes (3 in total, 0, 1 and 2).

So there is a lot of evidence that websockets seem to be culprit.  We have tried removing Netscaler from the connection picture by connecting directly to Storefront and HTML5 receiver works.  We have configured both Netscaler and Storefront (with what we think) is a correct configuration.  And still we are getting a failure.

I opened up a call to Citrix.

It was a fairly frustrating experience.  I had tech’s ask me to go to “Program Files\Citrix\Reciever” and get the receiver version (hint, hint, this does not exist with HTML5).  I captured packets of the failure “in motion” and they told me, “it’s not connecting to your XenApp server”.  — Yup.  That’s the Problem.

It seems that HTML5 is either so new (it’s not now), so simple (it’s not really), or tech’s are just poorly trained.  I reiterated to them “why does it make 3 websocket connections on the ‘bad’ netscaler? Why does the ‘good’ netscaler appear to connect the first time without issue?”  I felt the tech’s ignore and beat around the bush regarding websockets and more focus put on the “Storefront console”.  Storefront itself was NOT logging ANYTHING to the event logs.  Apparently this is weird for a storefront failure.  I suspected Storefront was operating correctly and I was getting frustrated we weren’t focusing on what I suspected was the problem (websockets).  So I put the case on hold so I could focus on doing the troubleshooting myself instead of going around in circles on setting HTML5 to “always use” or “use only when native reciever is not detected”.

Reviewing the documentation for the umpteenth time this “troubleshooting connections” tidbit came out:

Troubleshooting Connections:

In cases where you are not able to connect some of the following points might help in finding out the problem. They also can be used while opening support case or seeking help in forums:

1) Logging: Basic connection related logs are logged by Receiver for HTML5 and Receiver for Chrome.

2) Browser console logs: Browsers would show errors related to certificates or network related failures here.

  • Receiver for HTML5: Open developer tools for HDX session browser tab. Tip: In Windows, use F12 shortcut in address bar of session page.

  • Receiver for Chrome: Go to chrome://inspect, click on Apps on left side. Click on inspect for Citrix Receiver pages (Main.html, SessionWindow.html etc)

The browser may show a log?  I wish I would have thought of that earlier.  And I wish Citrix would have put that in the actual “Receiver for HTML5” documentation as opposed to buried in a blog article.

So I opened the Console in Chrome, launched my application and reviewed the results:

We finally have some human readable information.

Websocket connections are failing during the handshake “Unexpected response code: 302”

What heck does 302 mean?  I installed Fiddler and did another launch withe Fiddler tracing:

 

I highlighted the area where it tells us it’s attempting to connect with websockets.  We can see in the response packet we are getting redirected, that’s what ‘302’ means.  I then found a website that lets you test your server to ensure websockets are working.  I tried it on our ‘bad’ netscaler:

 

Hitting ‘Connect’ left nothing in the log.  However, when I tried it with my ‘good’ netscaler…

 

It works!  So we can test websockets without having to launch and close the application over and over…

 

So we started to investigate the Netscaler.  We found numerous policies that did URL or content redirection that would be taking place with the packet formulated like so.  We then compared our Netscaler to the one in my homelab and did find one, subtle difference:

The one on the left is showing a rule for HTTP.REQ.URL.CONTAINS_ANY(“aaa_path”) where the one on the right just shows “is_vpn_url”.  Investigating further it was stated that our team was trying to get AAA authentication working and this was an option set during a troubleshooting stage.  Apparently, it was forgotten or overlooked when the issue was resolved (it was not applicable and can be removed).  So we set it back to having the “is_vpn_url” and retried…

It worked!  I tried the ‘websockets.org’ test and it connected now!  Looking in the Chrome console showed:

 

Success!  It doesn’t pause on the websocket connection and the console logging shows some interesting information.  Fiddler, with the working connection, now displayed the following:

Look!  A handshark response!

 

So, to review what we learned:

  1. Connections via Netscaler to HTML5 reciever do NOT require  (but is possible) a SSL connection on each target XenApp device
  2. Connection via Netscaler work over standard port (2598/1494) and do not require any special configuration on your XenApp server.
  3. You can use ‘http://www.websocket.org/echo.html’ to test your Netscaler to ensure websockets are open and working.
  4. Fiddler can tell you verbose information on your websocket connection and their contents.
  5. The web browser’s Javascript console is perfect to look at verbose messages in HTML5.

 

And with that, we are working and happy, Good Night!

Read More

Examining Logon Durations with Control Up – Profile Load Time

2016-09-15
/ /
in Blog
/

Touching on my last point with an invalid AD Home Directory attribute, I decided to examine it in more detail as what is causing the slowness on logon.

60s

52 seconds for Profile Load time.  This is caused by this:

bad_attribute

The Home Folder server ‘WSAPVSEQ07’ I created that share and set the home directory to it.  I then simulated a ‘server migration’ by shutting this box down.  This means that the server responds to pings because it’s still present in DNS.

ping

But why does it take so long?

If I do a packet capture on the server I’m trying to launch my application from, and set it to trace the ip.addr of the ‘powered off’ server, here’s what we see:

slow-logon-packet-capture

The logon process attempts to query the server at the ‘28.9’ second mark and stops at the ‘73.9’ second mark.  A total of 45 seconds.  This is all lumped into the ‘User Profile Load Time’.  So what’s happening here?

We can see the initial attempt to connection occurs at 28.9 seconds, then 31.9 seconds, then finally 37.9 seconds.  The time span is 3s between the first and second try then 6s between the second and third try.  This is explained here.

In Windows 7 and Windows Server 2008 R2, the TCP maximum SYN retransmission value is set to 2, and is not configurable. Because of the 3-second limit of the initial time-out value, the TCP three-way handshake is limited to a 21-second timeframe (3 seconds + 2*3 seconds + 4*3 seconds = 21 seconds).

Is this what we are seeing?  It’s close.  We are seeing the first two items for sure, but then in instead of a ‘3rd’ attempt, it starts over but at the same formula.

= 45 seconds.

According to the KB article, we are seeing the Max SYN retransmissions (2) for each syn sent.  This article contains a hotfix we can install to change the value of the Max SYN retransmissions, but it’s a minimum of 2 which it’s set to anyways.  However, there is an additional hotfix to modify the 3 second time period.

The minimum I’ve found is we can reduce the 3 second time period to 100ms.

values

This reduces the logon time to:

faster_logon

19 seconds.

What does this packet capture look like?  Like this:
lower_syn_values

Even with the ‘Initial RTO’ set to 100ms, Windows has a MinRTO value of 300ms:

minrto

After the initial ‘attempt’ there is a 10-12 second delay.

Setting the MinRTO to the minimum 20ms

minrto_20

Reduces our logon time further now because our SYN packets are now about 200ms apart:

minrto_200ms

We are now 16 seconds, 13 seconds spent on the profile upon which 12 seconds was timing out this connection.

16s-logon

 

Would you implement this?  I would strongly recommend against it.  SYN’s were designed so blips in the network can be overcome.  Unfortunately, I know of no way to get around the ‘Home Directory responds to DNS but not to ping’ timeout.  The following group policies have no effect:

gpo1

gpo2

 

And I suspect the reason it has no effect is because the server still responds to DNS so the SYN sequence takes place and blocks regardless of this GPO settings.  Undoubtedly, it’s because this comes into play:
gpo_explain

Since our user has a home directory the preference is to ‘always wait for the network to be initialized before logging the user on’.  There is no override.

Is there a way to detect a dead home directory server during a logon?  Outside of long logon’s I don’t see any event logging or anything that would point us to determine if this is the issue without having to do a packet capture in a isolated environment.

 

 

Read More

ControlUp – Dissecting Logon Times a step further (invalid Home Directory impact)

2016-09-07
/ /
in Blog
/

Continuing on from my previous post, we were still having certain users with logons in the dozens of seconds to minutes.  I wanted to find out why and see if there is anything further that could be done.

60second_profile

 

After identifying a user with a long logon with ControlUp I ran the ‘Analyze Logon Duration’ script:

51-1second_profile

 

Jeez, 59.4 seconds to logon with 51.2 seconds of that spent on the User Profile portion.  What is going on?  I turned to process monitor to capture the logon process:

screen-shot-2016-09-07-at-8-24-18-pm

Well, there appears to be a 1 minute gap between the cmd.exe command from when WinLogon.exe starts it.  The stage it ‘freezes’ at is “Please wait for the user profile service”.

 

Since there is no data recorded by Process Monitor I tested by deleting the users profile.  It made no difference, still 60 seconds.  But, since I now know it’s not the user profile it must be something else.  Experience has taught me to blame the user object and look for network paths.  50 seconds or so just *feels* like a network timeout.  So I examined the users AD object:

screen-shot-2016-09-07-at-8-46-19-pm

 

Well well well, we have a path.  Is it valid?

screen-shot-2016-09-07-at-8-49-42-pm

 

 

It is not valid.  So is my suspicion correct?  I removed the Home Directory path and relaunched:

without_homedir_logon_time

Well that’s much, much better!

So now I want ControlUp to identify this potential issue.  Unfortunately, I don’t really see any events I can key in on that says ‘Attempting to map home drive’.  But what we can do is pull that AD attribute and test to see if it’s valid and let us know if it’s not.  This is the output I now generate:

new_script

 

I revised the messaging slightly as I’ve found the ‘Group Policy’ phase can be affected if GPP Drive Maps reference the home directory attribute as well.

 

So I took my previous script and updated it further.  This time with a check for valid home directories.  I also added some window sizing information to give greater width for the response as ‘Interim Delay’ was getting truncated when there were long printer names.  Here is the further updated script:

Read More

ControlUp – Dissecting Logon Times a step further (Printer loading)

2016-08-31
/ /
in Blog
/

We have applications that require printers be loaded before the application is started.  This is usually because the application will check for a specific printer or default printer and if one is not set (because it hasn’t mapped into the session) then it’ll throw up a dialog or not start entirely.

So we have this value ‘unchecked’ for some applications:

Screen Shot 2016-08-31 at 12.12.08 AM

But how does this impact our logon times?

Well… Our organization just underwent a print server migration/upgrade where some print servers were decommissioned and don’t exist.  But some users still have them or references to them on their end points.  We do have policies that only map your default printer, but some users are on a policy to map ‘all’ printers they have on their system.

What’s the impact?

Screen Shot 2016-08-31 at 12.15.10 AM

Waiting for printers before starting the application…

 

Screen Shot 2016-08-31 at 12.26.50 AM

Without waiting for printers

16 Seconds?  How is that so?

Well, it turns out waiting for printers and the subsystem components to support them add a fair amount of time, and then worse is network printers that don’t go anywhere anymore.  I’ve seen these logons wait for connection before timing out, all the while the user sits there and waits.  The script that comes with ControlUp for analyzing logons is good, but I wanted to know more on why some systems had long logon times and the only clue was Pre-Shell (userinit) taking up all the time.  So I dug into the print logs and found a way to measure their impact.

Screen Shot 2016-08-31 at 12.32.05 AM

With my modified script we can clearly see waiting for the printers takes ~15.4s with a few printers over a few seconds and the rest at 0.5 seconds or so.  One thing about this process is that mapping printers is synchronous.  So when or if 1 stalls, the whole process gets stuck.  All my printers were local except for the ‘Generic / Text Only’ which was a network printer where I powered off the server.  It hung the longest at 5.9 seconds, but I’ve seen ‘non-existant’ network mapped printers hang for 150 seconds or so…

To facilitate finding the printers we need to pass the clientName to the server and the Print Service Logs need to be enabled.

You can enable the print service logs on server 2008R2 by executing the following:

The ControlUp arguments need to look like this now:

Screen Shot 2016-08-31 at 12.40.36 AM

Here is my updated script:

I hope to dig into other startup components and further drill down into what our user launch process looks like.  We wait, and we see 🙂

Read More

ControlUp – List AppV5 recent events on various servers

2016-08-26
/ /
in Blog
/

David Falkus just posted a blog post on using Powershell to combine multiple AppV5 logs into a single view and orders them chronologically so you can see the events as they occurred.

Since this was a PowerShell script we can use ControlUp to import it, tweak it to accept some server variables and then get the output back to us.  Here is a video of this in action:

Here is the recipe for it:

1

2

3

4

 

And the script:

 

Read More

Using ControlUp to launch a Citrix application published on a server

2016-08-23
/ /
in Blog
/

Occasionally, we have Citrix servers that ‘die’ in a peculiar way.  What happens may vary when they die but the usual symptoms are something like:

  1. The server is still somewhat responsive.  It responds to pings, RPC requests (tasklist /s %servername%)
  2. The server is not responsive.  You cannot RDP to it, console CTRL-ALT-DEL fails, etc.

This is frustrating because the services appear to be operating so the Citrix server will say, “hey, I’m working!  I can take sessions!”  And usually these servers won’t have any sessions because logons actually fail so their “XenApp Server Load” is low, so its priority for sessions to be directed to it is higher!  So how do we detect these servers with these issues?  Unfortunately, I haven’t seen any events in the Event Viewer or anything that stands out to search and find these troublesome servers.  Using ControlUp, sometimes it’s obvious because that troublesome server will have a much lower session count than other servers or something else is at fault and triggers the ‘Stress Level’ to go critical.  But these flags don’t usually exist if the problem has just occurred, they usually are more visible after time has passed.

Our helpdesk asked if there was a way they could test servers to help pinpoint a troublesome one.  I came up with a “Script-based Action” that targets a specific Citrix server and lists all published applications on that server.  You then select the application and it generates a ICA file and tries to launch it.  You need to have permission to the application on Citrix and Powershell remoting enabled on the XenApp servers/ZDC’s .  So if your a Citrix admin and PS Remoting is enabled this script will work out of the box.

However, I tried to make the script dynamic so you could query the XenApp servers from a standalone server without installing Citrix Powershell SDK locally.  To do this I use PowerShell remoting so you need to have PowerShell remoting enabled on your Citrix servers in your environment.  Secondly, if you have ‘lower’ privilege users you need to grant them the ability to connect to the servers via PowerShell remoting (by default only Administrators have access).  To grant them access you need to do the following:

powershell-perms

And in the ‘Set-PSSessionConfiguration’ command you need to enable the ‘Invoke’ permissions on your support group:
permissions
As well, you need to grant view properties on your Citrix farm since the group needs to query application properties, and workergroups (if you publish your applications to workergroups):

IMA_Perms

Now that we have our permissions configured we can create our ControlUp Script-Based action:

SBA_1 SBA_2 SBA_3 SBA_4

So what does this look like?

And the script:

 

Read More