The Citrix Local Host Cache feature, introduced in XenDesktop/XenApp 7.12, has some nuances that maybe better demonstrated in realtime then typed out in text. I will do both in this article to share both a ‘step by step’ of what happens when you have a network or site database outage and what occurs as well as a realtime video highlighting the feature in action. There are many other blogs and articles that do a great job going into the step by step details of the feature but I find seeing it in action to be very informative.
To view a video of this process, scroll to the very end, or click here.
To start, I’ve created a powershell script that simulates a user querying the broker for a list of applicaitons.
Columns are time of the response, the payload size received (in bytes) and the total time to respond in milliseconds.
As we’re querying the broker, the broker is reaching out to the database and then responding to the user with the information requested.
Periodically, the Citrix Config Synchronizer Service will check to ensure the local host cache database is in sync with the site database. This is an event that occurs every 2 minutes during normal operation.
To show the network connection failing, I am going to setup a continuous ping to the database server
To simulate a network failure, I’m going to use the tool clumsy to drop all packets to and from the database server.
Clicking start in clumsy immediately stops the simulated user from getting their list of applications.
And the ping’s now time out in their requests.
The broker has a 20 second time out that after which it will respond to requests with what it thinks is the current status. The first timed out request receives a response of “working” and then thereafter a response of “pending failed” will be returned
Around 24 seconds the broker has noticed the database has failed and has logged it’s first event, 1201, “The connection between the Citrix Broker Service and the database has been lost”
Now one-minute thirty three seconds into the failure, other Citrix services are now reporting they cannot contact the database.
Just shy of 2 minutes, the broker service has now exceeded it’s timeout for contacting the database and is in the process of switching to the local host cache. It stops the “primary broker”.
And then the Citrix High Availability Service comes active, brokering user requests.
In my simulation the amount of time it took the user to receive a response from the LHC is a little faster than the site database. The LHC response time is 80-90 milliseconds where the response time for a request that includes the site database is 90-100. This information allows us to visually see the two different modes of operation in action.
How long does it take to “fall back” to the database when connectivity is restored?
I “Stopped” clumsy to restore our network connection and started a timer.
We can see the ping responses from the database immediately to verify our connection is back.
Almost immediately, all services have noticed that they have connectivity again, including the broker service.
However, we do not fall back immediately.
At one minute thirty three seconds the broker has switched back to the primary broker. And all services have been restored.
To watch a video of this all in action, please view here: