2012R2

Meltdown + Spectre – Performance Impact Analysis (Take 2)

2018-09-14
/ /
in Blog
/

I want to start this off by thanking ControlUp, LoginVSI and Ryan Ververs-Bijkerk for his assistance in helping me with this post.

Building on my last evaluation of the performance impact of Meltdown and Spectre, I was graciously given a trial of the LoginVSI software product used to simulate user loads and ControlUp’s cloud based analytic tool, ControlUp Insights.

This analysis takes into consideration the differences of operating systems and uses the latest Dell server hardware platform with a Intel Gold 6150 as the processor at its heart.  This processor contains the full PCID instructions to give the best performance (as of today) with mitigations enabled.  However, only Server 2012R2 and Server 2016 can take advantage of these hardware features.

Test Setup

This test was setup on 4 hosts.  2 of the hosts had VM’s where all the mitigations were enabled and 2 hosts had all of the mitigation features disabled.  I tested live production workloads and simulated user loads from LoginVSI.  The live production workloads were run on XenApp 6.5 on 2008R2 and the simulated workloads were on XenApp 7.15CU2 with 2008R2, 2012R2 and 2016.

Odd host numbers had the mitigation disabled, even host number had the mitigation enabled.

I sorted my testing logically in ControlUp by folder.

Real World Production results

The ControlUp Insights cloud product produced graphs and results that were easy and quick to interpret.  These results are for XenApp 6.5, Server 2008R2.

Hosts View

CPU Utilization

The mitigation-disabled “Host 1” had a higher consistency of using less CPU than the mitigation-enabled “Host 2”. The biggest spread in CPU was ~20% on the Intel Gold 6150’s with mitigation enabled to disabled.

IO Utilization

Another interesting result was IO utilization increased by an average of 100 IOPS for mitigation-enabled VM’s.  This means that Meltdown/Spectre also tax the storage subsystem more.  This averaged out to a consistent 12% hit in performance.

User Experience

The logon duration of the VM’s increased 43% from an average of 8 seconds on a mitigation-disabled VM to 14 seconds on a mitigation enabled VM.  The biggest jump in the sub-metrics were Logon Duration (Other) and Group Policy time going from 3-5s and 6-8s.

Mitigation Disabled

Mitigation Enabled

For applications we have that measure “user interactivity” the reduction in the user experience was 18%.  This meant that an action on a mitigation-enabled VM took an average of 1180ms vs 990ms on a mitigation-disabled VM when measuring actions within the UI.

Honestly, I wish I had ControlUp Insights earlier when I did my original piece, it provides much more detail in terms of tracking additional metrics and presents it much more cleanly that I did.  Also, when the information was available it was super quick and fast to look and compare the various types of results.

Simulated Results

LoginVSI was gracious enough to grant me licenses to their software for this testing.  Their software provides simulated user actions including pauses for coffee and chatting between work like typing/sending emails, reading word or PDF documents, or reviewing presentations.  The suite of software tested by the users tends to be major applications produced by major corporations who have experience producing software.  It is not representative of applications that could be impacted the most by Spectre/Meltdown (generally, applications that are registry event heavy).  Regardless, it is interesting to test with these simulated users as the workload produced by them do fall under the spectrum of “real world”.  As with everything, your mileage will vary and it is important to test and record your before and after impacts.  ControlUp with Insights does an incredible job of this and you can easily compare different periods in time to measure the impacts, or just rely on the machine learning suggestions of their virtual experts to properly size your environment.

Since our production workloads are Windows Server 2008R2 based, I took advantage of the LoginVSI license to test all three available server operating systems: 2008R2, 2012R2, and 2016.  Since newer operating systems are supposed to enable performance enhancing hardware features that can reduce the impact of these vulnerabilities, I was curious to *how much*.  I have this information now. 

I tested user loads of 500, 300, and 100 users across 2 hosts.  I tested one with Spectre/Meltdown applied and one without.  Each host ran 15VM’s with each VM having 30GB RAM and 6vCPU for an CPU oversubscription of 2.5:1.  The host spec was a Dell PowerEdge M640 with Intel 6150 Gold processors and 512GB of memory.

2016 – Hosts View

500 LoginVSI users with this workload, on Server 2016 pegged the hosts CPU to 100% on both the Meltdown/Spectre enabled and disabled hosts.  We can still see the gap between the CPU utilization between the two with the Meltdown Spectre hosts

300 LoginVSI users with this workload, on Server 2016 we see the gap is narrow but still visible.

100 LoginVSI users with this workload, on Server 2016 we see the gap is barely visible, it looks even.

2012R2 – Hosts View

500 LoginVSI users in this workload on Server 2012 R2.  There definitely appears to be a much larger gap between the meltdown enabled and disabled hosts.  And Server 2012R2 non-mitigated doesn’t cap out like Server 2016.

300 LoginVSI users in this workload on Server 2012R2.  The separation between enabled and disabled is still very prominent.

100 LoginVSI users in this workload on Server 2012R2.  Again, the separation is noticeable but appears narrower with lighter loads.

2008R2 – Hosts View

500 LoginVSI users in this workload on Server 2008R2.   Noticeable additional CPU load with the Meltdown/Spectre host.  A more interesting thing is it apperas overall CPU utilization is lower than 2012R2 or 2016.

300 LoginVSI users in this workload on Server 2008R2.  The separation between enabled and disabled is still very prominent.

100 LoginVSI users in this workload on Server 2008R2.  I only captured one run and the low utilization makes the difference barely noticeable.

Some interesting results for sure.  I took the data and put it into a pivot table to highlight the CPU differences for each workload against each operating system.

This chart hightlights the difference in CPU percentage between mitigation enabled and disabled systems.  The raw data:

Again, interesting results.  2008R2 seems to have the largest average CPU seperation, hitting 14%, followed by 2012R2 at 11% and than 2016 having a difference of 4%.

One of things about these results is that they highlight the “headroom” of the operating systems.  2008R2 actually consumes less CPU and so it has more room for separation between the 3 tiers.  On the 2016, there is so much time spent where the CPU was pegged at 100% for both types of host that makes a difference of “0%”.  So although the smaller number on server 2016 may lead you to believe it’s better, it’s actually not.

This shows it a little more clear.  With mitigations *enabled*, Server 2008R2 can do 500 users at less average CPU load than 2016 can do 300 users with mitigations *disabled*.

From the get-go, Server 2016 appears to consume 2x more CPU than Server 2008R2 in all non-capped scenarios with Server 2012 somewhere in between.

When we compare the operating systems against the different user counts we see the impact the operating system choice has on resources. 

Mitigation Disabled:

100 Users
300 Users
500 Users

Mitigation Enabled:

100 Users
300 Users
500 Users

Final Word

Microsoft stated that they expected to see less of an impact of the Spectre/Meltdown mitigations with newer operating systems.  Indeed this does turn out to be the case.  However, the additional resource cost of newer operating systems is actually *more* than running 2008R2 or 2012R2 with mitigations enabled .  So if you’re environment is sized for running Server 2016, you probably have infrastructure that has already been spec’ed for the much heavier OS anyways.  If your infrastructure has been spec’ed for the older OS than you will see a larger impact.  However, if you’ve spec’ed for the larger OS (say for a migration activity) but are running your older OS’s on that hardware, you will see an impact but it will be less than when you go live with 2016.

Previously I had stated that there are two different, important performance mechanisms to consider; capacity and how fast work actually gets done.  All of these simulated measurements are about capacity.  I hope to see how speed is impacted between the OS’s, but that may have to wait for a future posting. 

Tabulating the LoginVSI simulated results without ControlUp Insights has taken me weeks to properly format the results.  I was able to use a trial of ControlUp Insights to look at the real world impact of our existing applications and workloads.  If my organization ever purchases Insights I would have had this post up a long time ago with even more data, looking at things like the storage subsystems.  Hopefully we acquire this product in the future and if you want to save yourself and your organization time, energy and effort getting precise, accurate data that can be compared against scenarios you create: get ControlUp Insights.


I’m going to harp on this, but YOUR WORKLOAD MATTERS MORE than these simulated results.  During this exercise, I was able to determine with ControlUp Insights that one of our applications is so light that we can host 1,000 users from a single host where that host struggled with 200 LoginVSI users.  So WORKLOAD MATTERS.  Just something to keep in mind when reviewing these results.  LoginVSI produces results that can serve as a proxy for what you can expect if you can properly relate these results to your existing workload.  LoginVSI also offers the capability to produce custom workloads tailored to your specific environment or applications so you can gauge impact with much more precision. 

Read More

Citrix Provisioning Service – Network Service Starting/Stopping services remotely

2018-05-02
/ /
in Blog
/

Citrix Provisioning Services has a feature within the “Provisioning Services Console” that allows you to stop/restart/start the streaming service on another server:

 

This feature worked with Server 2008R2 but with 2012R2 and greater it stopped working.  Citrix partially identified the issue here:

 

I was exploring starting and stopping the streaming service on other PVS servers from the Console and I found this information was incorrect.  Adding the NetworkService does NOT enable the streaming service to be stop/started/restarted from other machines.  The reason is the NETWORKSERVICE is a LOCAL account on the machine itself.  When it attempts to reach out and communicate with another system it is translated into a proper SID, which matches the machine account.  Since that SID communicating across the wire does not have access to the service you get a failure.

In order to fix this properly we can add either the machine account permissions for each PVS Server on each service OR we can add all machine accounts into a security group and add that as permissions to manipulate the service on each PVS Server.

I created a PowerShell script to enable easily add a group, user or machine account to the Streaming Service.  It will also list all the permissions:

An example adding a Group to the permissions to the service:

And now we can start the service remotely:

 

In order to get this working entirely I recommend the following steps:

  1. Create a Group (eg, “CTX.Servers.ProvisioningServiceServer”)
  2. Add all the PVS Machine Accounts into that group
  3. Reboot your PVS server to gain that group membership token
  4. Run the powershell script on each machine to add the group permission to the streaming service:
  5. Done!

And now the script:

 

Read More

Troubleshooting Citrix Desktop Service – “The Citrix Desktop Service is starting.” and then nothing.

2015-09-21
/ /
in Blog
/

I’m troubleshooting an issue with the Citrix Desktop Service on my home lab.  I have a Citrix XenApp/XenDesktop 7.6 installation and I have setup a Server 2012 R2 box with the “/servervdi”.  Upon reboot, I see the Registration State as Unregistered.

Restarting the Citrix Desktop Service and checking the ‘Application’ Log for ‘Citrix Desktop Service’ yields only event ID 1028 – “The Citrix Desktop Service is starting.”

However, when I ‘Stop’ the Citrix Desktop Service the Application event log gives up a few more details:

Event ID 1003 – Citrix Desktop Service
The Citrix Desktop Service failed to initialize communication services required for interaction between this machine and delivery controllers. 

If the problem persists please perform a ‘repair’ install action or reinstall the Citrix Virtual Desktop Agent. Refer to Citrix Knowledge Base article CTX119736  for  further information. 

Error details: 
Failed to start WCF services. Exception ‘Object reference not set to an instance of an object.’ of type ‘System.NullReferenceException’

Unfortunately, this CTX article gives little to no details on event 1003 and is more of a shotgun attempt at solving issues as opposed to a nice, precise, surgical solution.

From the EventID 1003 we can see the Citrix Desktop Service is trying to reference WCF services and it has failed.

Citrix offers a tool called XDPing to try and diagnose issues, I ran it and had it return the following:

Googling the WCF errors with HTTP/1.1 and Error 503 results in lots of information on reconfiguring your IIS.  I’m not convinced this is the issue so I soldiered on…

When I procmon on ‘BrokerAgent.exe’ I see a few curious entires that maybe associated with ‘System.NullReferenceException‘ (aka, not found) and those are some permissions stating that access is denied to some registry keys and/or some ‘Name Not Found’ on some CLSID items.

During this capture I can see a ‘BrokerAgent.config’ file referenced.  (“C:\Program Files\Citrix\Virtual Desktop Agent\BrokerAgent.exe.config” ) Diving into it reveals some additional logging we can enable:

Verbose logging disabled by default

If we ‘enable’ the debug portions and create the C:cdsLogs folder we can get some more information on what is going wrong.

Logging Enabled

Stopping the ‘Citrix Desktop Service’, editing the .config file, creating the C:\cdslogs folder and starting the service yielded additional information.

We have a ‘COM exception’.  The nice thing about this log file is we can compare the time stamps to the procmon logs and determine what was happening when this failed.

It appears we are missing SCService64.exe from our Citrix installation.  What is ‘SCService64.exe’?  The registry tells me it’s the ‘IStackControl Type Library’.  Which matches up with the error ‘ConnectToStackControlCOMServer’.

So, it appears we need to install SCService64.exe.  I do not know why or how it went missing but I suspect we can copy it over or extract from the Citrix source files if needed.

To extract the SCService64.exe source file, we can create an administrative installation of the “TS” VDA:

msiexec /a “\x79-serversoftwareCitrixXenApp_and_XenDesktop7_6x64Virtual Desktop ComponentsTSIcaTS_x64.msi”
This makes a folder on the root of the C: called “Citrix” which installs all the files there:

I installed the Citrix VDA “WS” with /servervdi for some testing, but prior to that I had the regular “TS” VDA installed.  Perhaps some combination of my installation/uninstallation caused my issues.

Anyways, copying the SCService64.exe to the C:\Program Files (x86)\Citrix\System32 folder and restarting the ‘Citrix Desktop Service’ resulted in…

Registered, Hurray!

And what about our log file?

Previously it died around ‘Setting up ALL LaunchManager WCF service’; this time we see:

Hurray!  It continues and operates without issue.

Be sure to re-disabled the logging on the BrokerAgent.config file and restart the service because I’ve found this BA_1.log can get to become a huge file.

Running XDPing this time results in [OK] across where the errors once were.

Read More

Move ALL Windows 7/2008/2012 log files to another drive

2014-04-10
/ / /

 

Read More

Windows Server 2012 R2 cache drive size for parity drives

2013-08-19
/ / /

It turns out the maximum a parity drive write cache size can be is 100GB.  I have a 500GB SSD (~480GB real capacity) so the maximum write cache size I can make for a volume is 100GB.  I suspect I maybe able to create multiple volumes and have each of them with a write cache of 100GB.  Until then this is the biggest it seems you can make for a single volume, so MS solves that issue of having a too large write cache.

Read More

Testing Windows Storage Spaces Performance on Windows 2012 R2

2013-08-08
/ / /

Windows Storage Spaces parity performance on Windows Server 2012 is terrible.  Microsoft’s justification for it is that it’s not meant to be used for anything except “workloads that are almost exclusively read-based, highly sequential, and require resiliency, or workloads that write data in large sequential append blocks (such as bulk backups).”

I find this statement to be a bit amusing because trying to back up anything @ 20MB/sec takes forever.  If you setup a Storage Spaces parity volume at 12TB (available space) and you have 10TB of data to copy to it just to get it going it will take you 8738 seconds, or 145 hours, or 6 straight days.  I have no idea who thought anything like that would be acceptable.  Maybe they want to adjust their use case to volumes under 1GB?

Anyways, with 2012R2 there maybe some feature enhancements including a new feature for storage spaces; ‘tiered storage’ and write back caching.  This allows you to use fast media like flash to be  a staging ground so writes complete faster and then the writes to the fast media can transfer that data to the slower storage at a time that is more convient.  Does this fix the performance issues in 2012?  How does the new 2-disk parity perform?

To test I made two VM’s.  One a generic 2012 and one a 2012R2.  They have the exact same volumes, 6x10GB volumes in total.  The volumes are broken down into 4x10GB volumes on a 4x4TB RAID-10 array, 1x10GB volume on a 256GB Samsung 840 Pro SSD and 1x10GB volume on a RAMDisk (courtesy of DataRAM).  Performance for each set of volumes is:

4x4TB RAID-10 -> 220MB/s write, 300MB/s read
256MB Samsung 840 Pro SSD -> ~250MB/s write, 300MB/s read
DataRAM RAMDisk -> 4000MB/s write, 4000MB/s read

The Samsung SSD volume has a small sequential write advantage, it should have a significant seek advantage, as well since the volume is dedicated on the Samsung it should be significantly faster as you could probably divide by 6 to get the individual performance of the 4x10GB volumes on the single RAID.  The DataRAM RAMDisk drive should crush both of them for read and write performance under all situations.  For my weak testing, I only tested sequential performance.

First thing I did was create my storage pool with my 6 volumes that reside on the RAID-10.  I used this powershell script to create them:

The first thing I did was create a stripe disk to determine my maximum performance amoung my 6 volumes.  I mapped to my DataRAM Disk drive and copied a 1.5GB file from it using xcopy /j

Performance to the stripe seemed good.  About 1.2Gb/s (150MB/s)

I then deleted the volume and recreated it as a single parity drive.

Executing the same command xcopy /j I seemed to be averaging around 348Mb/s (43.5MB/s)

This is actually faster than what I remember getting previously (around 20MB/s) and this is through a VM.

I then deleted the volume and recreated it as a dual parity drive.  To get the dual parity drive to work I actually had to add a 7th disk.  5 nor 6 would work as it would tell me I lacked sufficient space.

Executing the same command xcopy /j I seemed to be averaging around 209Mb/s (26.1MB/s)

I added my SSD volume to the VM and deleted the storage spaces volume.  I then added my SSD volume to the pool and recreated it with “tiered” storage now.

When I specified to make use the SSD as the tiered storage it removed my ability to create a parity volume.  So I created a simple volume for this testing.

Performance was good.  I achieved 2.0Gb/s (250MB/s) to the volume.

With the RAMDisk as the SSD tier I achieved 3.2Gb/s (400MB/s).  My 1.5GB file may not be big enough to ramp up to see the maximum speed, but it works.  Tiered storage make a difference, but I didn’t try to “overfill” the tiered storage section.

I wanted to try the write-back cache with the parity to see if that helps.  I found this page that tells me it can only be enabled through PowerShell at this time.

I enabled the writecache with both my SSD and RAMDisk as being a part of the pool and the performance I got for copying the 1.5GB file was 1.8Gb/s (225MB/s)

And this is on a single parity drive!  Even though the copy completed quickly I could see in Resource Manager the copy to the E: drive did not stop, after hitting the cache at ~200MB/s it dropped down to ~45-30MB/s for several seconds afterwards.

You can see xcopy.exe is still going but there is no more network activity.  The total is in Bytes per second and you can see it’s writing to the E: drive at about 34.13MB/s

I imagine this is the ‘Microsoft Magic’ going on where the SSD/write cache is now purging out to the slower disks.

I removed the RAMDisk SSD to see what impact it may have if it’s just hitting the stock SSD.

Removing the RAMDisk SSD and leaving the stock SSD I hit about 800Mb/s (100MB/s).

This is very good!  I reduced the writecache size to see what would happen if the copy exceeded the cache…  I recreated the volume with the writecachesize at 100MB

As soon as the writecache filled up it was actually a little slower then before, 209Mb/s (26.1MB/s).  100MB just isn’t enough to help.

100MB of cache is just not enough to help

Here I am now at the end.  It appears tiered storage only helps mirrored or stripe volumes.  Since they are the fastest volumes anyways, it appears the benefits aren’t as high as they could be.  With parity drives though, the writecachesetting has a profound impact in the initial performance of the system.  As long as whatever fills the cache as enough time to purge to disk in the inbetweens you’ll be ok.  By that I mean without a SSD present and write cache at default a 1GB file will copy over at 25MB/s in 40 seconds.  With a 100MB SSD cache present it will take 36 seconds because once the cache is full it will be bottlenecked by how fast it can empty itself.  Even worse, in my small scale test, it hurt performance by about 50%.  A large enough cache probably won’t encounter this issue as long as there is sufficient time for it to clear.  Might be worthwhile to invest in a good UPS as well.  If you have a 100GB cache that is near full and the power goes out it will take about 68 minutes for the cache to finish dumping itself to disk.  At 1TB worth of cache you could be looking at 11.37 hours.  I’m not sure how Server 2012R2 deals with a power outage on the write cache, but since it’s a part of the pool I imagine on reboot it will just pick up where it left off…?

Anyways, with storage spaces I do have to give Microsoft kudos.  It appears they were able to come close to doubling the performance on the single parity to ~46MB/s.  On the dual-parity it’s at about 26MB/s under my test environment.  With the write cache everything is exteremely fast until the write cache becomes full.  After that it’s painful.  So it’s very important to size up your cache appropriately.  I have a second system with 4x4TB drives in a storage pool mirrored configuration.  Once 2012 R2 comes out I suspect I’ll update to it and change my mirror into a single parity with a 500GB SSD cache drive.  Once that happens I’ll try to remember to retest these performance numbers and we’ll see what happens 🙂

Read More