Citrix XenApp Enumeration Performance – IMA vs FMA load testing

Citrix XenApp Enumeration Performance – IMA vs FMA load testing

2017-04-12
/ /
in Blog
/

In the previous post we found IMA outperformed FMA in terms of an individual request for resources.  However, the 30ms difference would be imperceptible for a end user.  This post will focus on maximum load for an individual broker.  In the past we discovered that the IMA service is heavily dependent on the number of CPU’s.  The more CPU’s allocated to the machine hosting the IMA service, the more requests could be handled (supposedly to a max of 16 — the maxmium number for threads for IMA).

What I have configured for my testing is a XenApp 6.5 machine with 2 sockets and 1 vCPU’s for a total of 2 cores.  I have an identical machine configured for XenApp 7.13.

Doing my query directly to the 7.13 broker, I’ve discovered my request appears to be tied to the performance counter “Citrix XML Service – Transactions/sec – enumerate resources”.  I can visibly see how many requests are hitting the broker through my testing, although the numbers vs what I’m sending don’t necessarily line up.

For XenApp 6.5 the performance counter that appears to tie directly to my requests is the “Citrix MetaFrame Presentation Server – Filtered Application Enumerations/sec”.

Using WCAT to apply a load to XenApp 6.5 and 7.13 and measuring the per second results I should be able to see which is more efficient over a larger load.

My WCAT options for the FMA server will be:

and for IMA:

 

My “scenario” file (XML-Direct.ubr) looks like so:

The new FMA service has an additional counter called “Citrix XML Service – enumerate resources – Avg. Transaction Time” which actually reports back how long it took to execute the enumeration.

Just like my previous testing I’m going to test with 2, 4, 8 and now with 16 CPU.  Both machines are VM’s on the same cluster with the same specs for their host (Intel Xeon E5-2680 @ 2.70GHz).  A difference between the two VM’s is the XenDesktop 7.13 is a Server 2016 Standard platform.

2vCPU

IMA is in BLUE and FMA is in ORANGE.

Interesting results.  It appears XenDesktop FMA broker can handle higher loads with much greater efficiency compared to IMA.  And it does not appear to take much for FMA to flex some muscle and show that it can handle the same load faster than IMA.  FMA kept under our 5000ms target up to 180 concurrent connections where as IMA breaks that around 140 connections.  This initial data shows that FMA could handle approximately 23% more load.  Fairly impressive.

4vCPU

Again, FMA continues its dominance in enumeration performance.  The gap between them grows slightly to 26% at 400 concurrent connections.  At our target of under 5000ms, IMA breaks at 250 connections, FMA actually held until around 380 connections.

8vCPU

Again, FMA handles the additional load with the additional CPU’s without breaking a sweat.  It seemed to be unstoppable, keeping under 5000ms until about 655 connections.  IMA, on the other hand, exceeded 5000ms response time at 400 connections.  But I had something strange happen that I assumed to be my fault.  The FMA broker service suddenly spiked at 800 concurrent connections (this graph only shows up to 780) and its response time zoomed to 30-40 seconds.  I assumed this was a fault of mine and continued on, trying 16 CPU’s.

16vCPU

But this graph shows it pretty readily.  As soon as you cross 800 concurrent connections FMA pukes.  I didn’t scale the graph to show that spike, but it goes up to 40 seconds.  So there appears to be a pretty hard limit of <800 concurrent connections (600 would probably be a pretty safe buffer…).  If you exceed that limit your performance is going to tank, HARD.  IMA, however, pushes on.  With 16vCPU IMA didn’t break the 5000ms target until 570 concurrent connections.  FMA appeared to be handling it just fine until it exploded.

For this testing the FMA broker is set in ‘connection leasing’ mode.  Perhaps this is related to that?  My next test will be to set the broker to Local Host Cache mode, retest, and then simulate a DB failure and test the LHC and see how quickly it can respond.

Summary:

The FMA broker, when acting in a purely XenApp fashion works pretty well.  It handles loads faster than IMA, but apparently this is only true to an extreme rate.  You should not have more than 800 concurrent connections per broker.  <period>.   You should probably keep a target maximum of 600 to be safe.  And assign at least 4, but preferably 8 CPU’s if you can to your broker servers.  This appears to be the sweet spot.

The highest rate our organization sees is 600 concurrent connections but we spread this lead across 9 IMA brokers, and divide that geographically.  The highest consecutive load we’ve measured with this on any single broker is ~150 concurrent connections during a peak load period.  We target a response time of less than 5 seconds for any broker enumeration and it does appear we could handle this quite easily with FMA.  However, this does not take into account XenDesktop traffic which is ‘heavier’ than XenApp for the FMA.  When we get to a point that I can test XenDesktop load I will do so.  Until then, I am impressed with FMA’s XenApp enumeration performance (until breakage anyways).

Next up…  Oddities encountered in testing

One Comment

  1. Pingback: Citrix XenApp Enumeration Performance – IMA vs FMA – oddities – Trentent Tye – Microsoft MVP

Post a Comment

Your email address will not be published. Required fields are marked *

*