Provisioning Services

Citrix Provisioning Service – Network Service Starting/Stopping services remotely

2018-05-02
/ /
in Blog
/

Citrix Provisioning Services has a feature within the “Provisioning Services Console” that allows you to stop/restart/start the streaming service on another server:

 

This feature worked with Server 2008R2 but with 2012R2 and greater it stopped working.  Citrix partially identified the issue here:

 

I was exploring starting and stopping the streaming service on other PVS servers from the Console and I found this information was incorrect.  Adding the NetworkService does NOT enable the streaming service to be stop/started/restarted from other machines.  The reason is the NETWORKSERVICE is a LOCAL account on the machine itself.  When it attempts to reach out and communicate with another system it is translated into a proper SID, which matches the machine account.  Since that SID communicating across the wire does not have access to the service you get a failure.

In order to fix this properly we can add either the machine account permissions for each PVS Server on each service OR we can add all machine accounts into a security group and add that as permissions to manipulate the service on each PVS Server.

I created a PowerShell script to enable easily add a group, user or machine account to the Streaming Service.  It will also list all the permissions:

An example adding a Group to the permissions to the service:

And now we can start the service remotely:

 

In order to get this working entirely I recommend the following steps:

  1. Create a Group (eg, “CTX.Servers.ProvisioningServiceServer”)
  2. Add all the PVS Machine Accounts into that group
  3. Reboot your PVS server to gain that group membership token
  4. Run the powershell script on each machine to add the group permission to the streaming service:
  5. Done!

And now the script:

 

Read More

Corrupt Registry Repair with Citrix Provisioning Services

2018-04-02
/ /
in Blog
/

I encountered an interesting issue and worked through a solution with a corrupt registry.  The issue seemed innocuously enough, we upgraded PowerShell to 5.1.  Upon reboot I encountered a bluescreen with code 0xF4:

I have encountered this issue before, but I haven’t recorded my troubleshooting steps until now.

Since this was a PVS target device, the easy method was deleting the version and trying to upgrade PowerShell to 5.1, which resulted in the same BSOD.  So it was easily reproducible.  So I tried it a few more times, because, why not?

I then deleted this version, and booted into the system and looked at Event Viewer for hints of what could be at fault.  Going through it was pretty obvious that it was registry corruption:

Filtering for EventID 5 shows all the attempts of booting to BSOD:

 

I mean, it literally says the Registry was corrupted 🙂

Where can you find this corruption?  With PVS it’s fairly simple but I believe the same process can exist for other systems including physical.  The first step was to mount the vDisk to a VM.

 

Mount the registry hive you suspect with corruption.

Next is to scan the registry for corruption.  Thus far, I’ve only found corruption to be detectable if it’s a key or value that cannot be read.  If the data on a value can be read but contains garbage it’s much harder to detect.  In order to avoid permissions being a problem, I open a PowerShell prompt as SYSTEM using PSEXEC.  If you don’t elevate permissions, some keys maybe restricted from the Admins group and this will be detected as a failure.

Once at this stage, it’s a one-liner to scan the registry:

In my experience, corruption can be detected as “Permission Denied”, “Access Denied”,”Path does not exist” or some such:

 

At this point you can examine the text file to see the last path it explored:

 

At this point you can open regedit (as SYSTEM) and examine the keys within that path.  Clicking through each one revealed the corrupted key:

Attempting to get Permissions on this key reveals it also exists on the ACL level:

“The requested security information is either unavailable or can’t be displayed”.

Deleting the key may fail as well:

At this point you need to evaluate how to manage the corruption.  If you cannot delete the key, rename it, or in some way replace it you may have an option like I did…  You can rename a higher up branch in the tree, go to a existing system with the same keys (with PVS I can go to the previous version and export that tree) and reimport.

Unload the hive and boot up the system –> and you may have a fully working system!

I’ve used this trick here, and on a corrupt COMPONENTS hive in the past.  With the COMPONENTS hive I got lucky I could replace the corrupted keys with ones from a branched vDisk.  Other machines didn’t have the same key in COMPONENTS so I got lucky.

 

Read More

Citrix Provisioning Server – PXE requests stop working

2017-08-10
/ /
in Blog
/

We use a bootable ISO in our environment to boot our VM’s to a specific set of PVS servers.  This ISO will vary by region enforcing that each target device that boots will be directed to their closest PVS server.

However, we have 1 region that does not leverage this capability and this region was designed to utilize the PXE services of the Citrix PVS server’s.  Occasionally, we encountered VM’s that will not boot and instead the console shows “PXE-E53: no boot filename received”

 

When I logged onto the Citrix PVS servers, I checked their services.  Both services were reported as “Running”:

When I checked the event logs I did not see any errors in either the application log or the system log.  Administrative events showed nothing out the usual either.

In order to confirm that the PVS service was actually listening, I executed

this showed me all the open ports the server was listening for and the processes tied to those ports.  Since PXE is a UDP operation, I examined the UDP portion of the netstat output.  

Port 69 is used by TFTP to transfer files, and port 67 is used by PXE.  However, I only saw port 69, port 67 was no where to be found.  I restarted the “Citrix PVS PXE Service”, reran netstat and confirmed that the PXE port was not listening and matched up the process ID to the proper services.

Restarting the failed target devices and they began to boot properly.

However, why did this fail in the first place?  I read on the Citrix forums that the Citrix services can become unbound if the network is not available when the services are started.  To test this I rebooted one the affected Citrix PVS servers.  Sure enough, it came back up with port 67 not being monitored but the service in a ‘Running’ state.  I wanted to see if I could capture the flow of communication from the network and when the service started so I used procmon and enabled “Boot Logging”.

Lo and behold, the procmon monitoring on startup added enough of a delay that the PXE service was bound consistently.  Stopping the boot logging and the PXE service would start but fail to bind to the port.

So now this leads to a bit of a quandary.  The delay seems to be in the milliseconds.  I’ve considered a couple solutions for this issue.

  1. A startup script that checks to ensure both ports and restarts the proper service if one of the ports is not found.
  2. Change the service startup type to be “Automatic (Delayed Start)”.  This delays the service by up to 2 minutes.  This does mean that the PVS server will NOT be able to service target device boot requests during this window.

I think we’re going to go with option 2.  The reason is we can apply this setting change via Group Policy Preferences.  This ensures that if we any removal/upgrade of the PVS software this setting will get reapplied.  And then we don’t have to worry about upgrading the OS and losing the startup script either or maintaining a script.

We’ve been affected by this a few times in the past, the fix has always been to restart the PVS server, but I managed to hit a window where the failure was happening consistently and managed to get this information.  🙂

 

Read More

Citrix Provisioning Services Reverse Imaging – 2017 edition

2017-08-02
/ /
in Blog
/

We’ve been trying to upgrade our Citrix PVS tools to version 7.13 and had some issues.  I need to reverse image so I can remove the software without dealing with the in-place upgrade procedure. So I decided this would be a good time to update my reverse imaging process.  This time I will only be using native tools built into the OS.  With that, this is my new procedure for reverse imaging.  It’s much faster and easier than my last process, but it does have a dependency.  You must be running Windows 2012 or greater so that dism.exe is a part of the OS.  This reverse imaging process is VMWare specific but the principals should apply to all hypervisors.  The overview of the process is as follows:
Create a local hard drive on a ‘build’ server that your vDisk will be based off of.
Attach that disk to the PVS server
Image vDisk to local disk
Boot local disk, manipulate as needed
Image local disk back

  1. Create a new hard disk on your ‘build’ VM that we will use as the staging for the vDisk you want to reimage
  2. Attach the newly created disk to your PVS server
  3. Setup the disks on the PVS server.  In my example, the G: drive is a mounted VHDX of the Citrix vDisk.  I’m using DISM to capture an image file of this disk to a wim that I will apply to the local disk created in step 1.  You will need to create a partition on the disk you made in step 1 and set it as active (if this is a MBR disk).  Unfortunately, DISM does not do disk to disk imaging so this intermediary step is required.  In my video here, the E: drive is the local disk that belongs to my target ‘build’ server.  In addition, you must fix the BCD store as it will point to the original partition, not your new target partition.
    The BCD commands are:

    You will need to replace “partition=E:” with your drive letter of the LOCAL disk.

  4. Remove your hard disk from the PVS virtual machine without deleting it.  Ensure the disk is attached to your BUILD target device.  If you are booting up your BUILD target device from an ISO or PXE you may need to disable those features so that it boots from the hard disk.  I also disconnect the network from the virtual machine because we will need to reset the computer account with AD.
  5. To reset the computer account with AD, you need to login to the system.  The easiest method is if you know the local administrator password, log in with that and “rejoin” the domain.  If not, disconnect the network then logon with an account that has recently logged onto the server.  Cached credentials should be able to pass you through.  Once you are in, connect the network and rejoin the domain.  You can typically “rejoin” by changing the domain name to the NETBIOS name, or the reverse if the NETBIOS name is present

    At this point you can do whatever work is required.

  6. Once you are done with whatever work you need to do, set the target device to boot from hard disk, but keep your vDisk set as an option.  This will attach the vDisk on boot as a secondary drive.  This will allow us to image back to it.  You need to ensure this vDisk is in PRIVATE mode OR has a Maintenance version.  This will set the vDisk to a Read/Write mode.

    If you have your WRITE drive attached, you will need to ensure your local system disk is the LAST disk in the order for your VM.  The PVS boot loader, when set to boot from hard disk, seems to try and boot the LAST disk it detects.


    Checking the PVS Status Tray shows you’ve booted from a local hard drive.


    Something you will need to check and validate is that the Hard Disk of the Citrix vDisk is NOT the drive letter of your write cache (if you find you cannot attach a write cache).  The reason is you have probably redirected the page file, event logs or whatever and if the Citrix vDisk occupies that drive letter you will not be able to image to it because it will be in use.
  7. To image back open the Imaging Wizard:

    And go through the imaging process.

    I prefer to image to the vDisk volume (as seen in this process).

  8. Done!  Set the device to boot via the vDisk as opposed to Hard Disk, delete the local OS disk, boot up your vDisk and run any scripts to ‘finalize’ your image.
Read More

Remove Ghost devices natively with Powershell

2017-06-28
/ /
in Blog
/

We’ve been looking at using Base Image Scripting Framework (BISF) as our new preparation and personalization platform for our PVS farm.  In our current world we have a bunch of features and tweaks that are not apart of BISF so I’ve been writing some additions so that we achieve feature parity.

One of the features I wanted to take a good hard look at was removing ‘Ghost’ devices — or devices that aren’t present in your system.  Ghost devices look like this:

Our current world accomplishes this task by using a script written by Simon Price and devcon.exe.  There is nothing wrong with this method per se but BISF is native Powershell and I want to stick to that without having outside dependencies.  Can this requirement be achieved with nothing but PowerShell?  Fortunately, google came to rescue and pointed me here.  A script written by Alexander Boersch got me 80% of the way there (whoo hoo!).  He wrote the method and ability to access setupapi.dll which gives us the functions and methods necessary to query and manipulate devices in Device Manager.  PowerShell is supposed to have the ability to do C# code natively and his example was perfect for taking me where I needed to go.

How does it work?  What’s the output?

 


 

notice the “filter match” text

 


 

And a brief video of it in action:

 

Lastly, the script:

 

 

Read More

Citrix Provisioning Services Target Device Upgrade Woes

2017-06-27
/ /
in Blog
/

We are running Citrix PVS 7.7 and we are attempting to upgrade to Citrix PVS 7.13 for the vDisks.  Starting with PVS 7.7 you are supposed to be able to do an in-place upgrade of the tools.  Our experience with this has been less than positive, with a success rate of ~50%.  This post is the issues we encountered and how we solved it for the other 50%.


Issue #1:

Error 2203. Database: C:\Windows\Installer\111cbf.ipi.  Cannot open database file.  System error -2147287037.

When I look for that ipi file I find it is not present.  Click ‘OK’ results in nothing happening.  Attempting to install PVS 7.13 as an in place upgrade results in Error 1923.Service Citrix PVS Device Service (BNDevice) could not be installed.  Verify that you have sufficient privileges to install system services.

In an attempt to move this along I stopped the BNDevice service and manually specified a deletion (“sc delete bndevice”).  This allowed the install to proceed without issue.


Issue #2:

Uninstalling PVS 7.13 Target Device Software fails with ‘Product key not found’.

Doing a Procmon trace I discovered that PVS 7.13 requires a registry key and value to uninstall cleanly:

For whatever reason, this key and value do not get created and do not exist on an upgrade or clean install but are required to uninstall.  Manually adding the key allows the PVS Target Device software to uninstall cleanly.


Issue #3:

When using the “Provisioning Services Imaging Wizard” an error occurs “The vDisk has no volumes to copy to.  Pick another vDisk.”

This error occurs because our uninstall did not remove the Citrix Storage controller driver, and when we did an install it added another Citrix storage controller.  This new controller adds an additional disk but since the disk is in use from the original connection it comes up as “offline”.

If you attempt to bring the Disk Online you get an “Incorrect function.” error:


Device Manager shows multiple devices:

There should only exist 1 of these Citrix devices.

To resolve this issue, simply select right-click one of the “Citrix Virual Hard Disk Adapter” and select “Uninstall”.

This may cause you to reboot.  On reboot you should only have a single device listed, and the offline disk should be gone from Disk Management.  At this point you can run the Imaging Wizard without issue.


Issue #4

“Citrix Provisioning Services Target Device x64” software is not displayed in ‘Programs and Features’, but it is obviously installed because the files exist, the registry keys exist, and the program is running.

It should be listed just under “Citrix Profile Management”

If you attempt to run the removal or modifyPath registry command

You get an error “Windows installer has stopped working”.  If you try and continue you MAY get stopped at “Error 1723. There is a problem with this Windows Installer package.  A DLL required for this install to complete could not be run.  Contact your support personnel or package vendor.  Action BNNS_Uninstall.FA231996_1469_4817_B7F3_61A648A18C07, entry: fnBNNS_Uninstall, library: C:\Windows\Installer\MSI6AFF.tmp”

As far as I can see, when in this state, ANY custom action by the MSI produces an error.  The Citrix Provisioning Services installer has numerous custom actions so this comes up a few times.  The first custom action that causes a crash checks to see if a reboot is pending and presents you a dialog if it detects a pending reboot.  I’ve not been able to find much about the second action…

The only way I’ve found to fix this issue is to ‘rip’ Citrix PVS Target Device software out of the system.  To ensure the bits and registry keys are all captured, I install the same version over top.  My hope is the ‘reinstall’ overtop will be a perfect overlay and any corruption or breaking will be corrected by the new install.  So far, this seems to be a successful strategy.

Here is how I ripped the software out.

  1. Reverse Image the vDisk
  2. Download MSIZap (from the Microsoft Windows SDK for Windows 7 and .Net Framework 3.5 SP1)
  3. Download SetACL from Helge Klein
  4. Extract both to the same folder and save this script to it (it targets PVS 7.7 specifically) and then run it:

     

So a little explanation on why the script is needed and what it’s doing.  The MSIZap removes the keys that ‘registers’ the product with the system.  This allows us to run the MSI as if it’s installing on a clean system.  This passes the first check of the MSI which would tell you to “Repair or Remove” instead.

However, after running MSIZap permissions are futzed pretty hard in the ‘Installer\Components” keys.  They just seem to be outright removed.  My understanding of the TWA! parameters are that permissions are supposed to be changed, but not removed.  However, specifying TWA! seems to just change permissions and not ‘delete’ anything.

So I specify just the “T” command with MSIZap afterwards and this will actually remove files and registry keys.

However, it seems to be semi-broken because the registry keys it touches are left broken without Ownership or Rights permissions applied.  Attempting to install when in this state generates messages like so:

“Error 1402. Could not open key: UNKNOWN\Components\84029B5E95851FA4EADA9BE7FB000B78\D4FA67639A2C23C4CA65B9CBB4AD5446. Verify that you have sufficient access to that key or contact your support personnel.”

If you navigate to that key you’ll find you get access is denied and exploring the permissions on the key shows other errors like no ownership set:

To correct these permissions being broken I run 3 SetACL commands.

The first command sets ownership on all keys to ‘EVERYONE’.  This corrects ownership on the bad keys so we don’t have error when setting permissions.

The second command resets inheritance so that permissions are now inherited from the root key.

The third command sets ownership back to SYSTEM from EVERYONE.

Lastly, it is necessary to remove the services as the PVS installer will check to see if they exist.  If they do they will stop the install.  We found we could not delete the BNDevice.exe because it was being held open by svchost.exe.  Rebooting fixed that and we could delete it and the “C:\Program Files\Citrix\Provisioning Services” folder.

And than, with all that, we can now install PVS 7.7 overtop of itself, and then removals and in-place upgrades work as expected.

Read More

Tracing Citrix Provisioning Service (PVS) Target Device Boot Performance – Process Monitor

2017-01-31
/ /
in Blog
/

Non-Persistent Citrix PVS Target Devices have more complicated boot processes then a standard VM.  This is because the Citrix PVS server components play a big role in acting as the boot disk.  They send UDP packets over the network to the target device.  This adds a delay that you simply cannot avoid (albeit, possibly a small one but there is no denying network communication should be slower than a local hard disk/SSD).

One of the things we can do is set the PVS target devices up in such a way that we can get real, measurable data on what the target device is doing while it’s booting.  This will give us visibility into what we may actually require for our target devices.

There are two programs that I use to measure boot performance.  Windows Performance Toolkit and Process Monitor.  I would not recommend running both at the same time because the logging does add some overhead (especially procmon in my humble experience).

The next bit of this post will detail how to offline inject the necessary software and tools into your target device image to begin capturing boot performance data.

Process Monitor

For Process Monitor you must extract the boot driver and inject the process monitor executable itself into the image.

To extract the boot driver simple launch process monitor, under the Options menu, select ‘Enable Boot Logging’

Then browse to your C:\Windows\System32\Drivers folder, and with “Show Hidden Files” enabled, copy out Procmon23.sys

It might be a good idea to disable boot logging if you did it on your personal system now 🙂

 

Now we need to inject the follow registry entry into our image:

Here are the steps in action:

Seal/promote the image.

On next boot you will have captured boot information:

To see how to use Windows Performance Toolkit for boot tracing Citrix PVS Target Device’s click here.

Read More

Tracing Citrix Provisioning Service (PVS) Target Device Boot Performance – Windows Performance Toolkit

2017-01-31
/ /
in Blog
/

Non-Persistent Citrix PVS Target Devices have more complicated boot processes then a standard VM.  This is because the Citrix PVS server components play a big role in acting as the boot disk.  They send UDP packets over the network to the target device.  This adds a delay that you simply cannot avoid (albeit, possibly a small one but there is no denying network communication should be slower than a local hard disk/SSD).

One of the things we can do is set the PVS target devices up in such a way that we can get real, measurable data on what the target device is doing while it’s booting.  This will give us visibility into what we may actually require for our target devices.

There are two programs that I use to measure boot performance.  Windows Performance Toolkit and Process Monitor.  I would not recommend running both at the same time because the logging does add some overhead (especially procmon in my humble experience).

The next bit of this post will detail how to offline inject the necessary software and tools into your target device image to begin capturing boot performance data.

Windows Performance Toolkit

For the Windows Performance Toolkit it must be installed on the image or you can copy the files from an install to your image in the following path:

To offline inject, simply mount your vDisk image and copy the files there:

 

Then the portion of it that we are interested in is “xbootmgr.exe” (aka boot logging).  In order to enable boot logging we need to inject the following registry key into our PVS Image:

Seal/promote the image.

On next boot you will have captured boot information:

To see how to use Process Monitor for boot tracing Citrix PVS Target Device’s click here.

Read More

Lets Make PVS Target Device Booting Great Again (Part 2)

2017-01-05
/ /
in Blog
/

Continuing on from Part 1, we are looking to optimize the PVS boot process to be as fast as it possibly can be.  In Part 1 we implemented Jumbo Frames across both the PVS target device and the PVS server and discovered that Jumbo Frames only applies to the portion where BNIStack kicks in.

In this part we are going to examine the option “I/O burst size (KB)”  This policy is explained in the help file:

I/O burst size — The number of bytes that will be transmitted in a single read/write transaction before an ACK is sent from the server or device. The larger the IO burst, the faster the throughput to an individual device, but the more stress placed on the server and network infrastructure. Also, larger IO Bursts increase the likelihood of lost packets and costly retries. Smaller IO bursts reduce single client network throughput, but also reduce server load. Smaller IO bursts also reduce the likelihood of retries. IO Burst Size / MTU size must be <= 32, i.e. only 32 packets can be in a single IO burst before a ACK is needed.

What are these ACK’s and can we see them?  We can.  They are UDP packets sent back from the target device to the PVS server.  If you open Procmon on the PVS server and startup a target device an ACK looks like so:

These highlighted 48byte UDP Receive packets? They are the ACKS

And if we enable the disk view with the network view:

 

With each 32KB read of the hard disk we send out 24 packets, 23 at 1464 bytes and 1 at 440 bytes.  Add them all together and we get 34,112 Bytes of data.  This implies an overall overhead of 1344 bytes per sequence of reads or 56 bytes per packet.  I confirmed it’s a per-packet overhead by looking at a different read event at a different size:

If we look at the first read event (8,192) we can see there is 6 packets, 5 at 1464 and one at 1208, totaling 8528 bytes of traffic.  8528 – 8192 = 336 bytes of overhead / 6 packets = 56 bytes.

The same happens with the 16,384 byte read next in the list.  12 packets, 11 at 1464 and one at 952, totaling 17,056.  17056 – 16384 = 672 bytes of overhead / 12 packets = 56 bytes.

So it’s consistent.  For every packet at the standard 1506 MTU you are losing 3.8% to overhead.  But there is secretly more overhead than just that.  For every read there is a 48 byte ACK overhead on top.  Admittedly, it’s not much; but it’s present.

And how does this look with Jumbo Frames?

For a 32KB read we satisfied the request in 4 packets.  3 x 8972 bytes and 1 at 6076 bytes totalling 32,992 bytes of transmitted data.  Subtracting the transmitted data from what is really required 32,992-32,768 = 224 bytes of overhead or…  56 bytes per packet 🙂

This amounts to a measly 0.6% of overhead when using jumbo frames (an immediate 3% gain!).

But what about this 32KB value.  What happens if we adjust it longer (or shorter)?

Well, there is a limitation that handicaps us…  even if we use Jumbo Frames.  It is stated here:

IO Burst Size / MTU size must be <= 32, i.e. only 32 packets can be in a single IO burst before a ACK is needed

Because Jumbo Frames don’t occur until after the BNIStack kicks in, we are limited to working out this math at the 1506 MTU size.

The caveat of this is the size isn’t actually the MTU size of 1506.  The math is based on the data that fits within, which is 1464 bytes.  Doing the math in reverse gives us 1464 x 32 = 45056 bytes.  This equals a clear 44K (45056 /1024) maximum size.  Setting IO/Burst to 44K and the target device still boots.  Counting the packets, there are 32 packets.

So if we up the IO/Burst by 1K to 45K (45*1024 = 46,080 bytes) will it still boot?

It does not boot.  This enforces a hard limit of 44K for I/O Burst until the 1st stage supports a larger MTU size.  I have only explored EFI booting, so I suppose it’s possible another boot method allows for larger MTU?

The reads themselves are split now, hitting the ‘version’ and the ‘base’ with the base being 25,600 + 20,480 for the version (46,080 bytes).  I believe this is normal for versioning though.

So what’s the recommendation here?

Good question.  Citrix defaults to 32K I/O Burst Size.  If we break the operation of the burst size we have 4 portions:

  1. Hard drive read time
  2. Packet send time
  3. Acknowledgement of receipt
  4. Turnaround time from receipt to next packet send

The times that I have for each portion at a 32K size appear to be (in milliseconds):

  1. 0.3
  2. 0.5
  3. 0.2
  4. 0.4

A total time of ~1.4ms per read transaction at 32K.

For 44K I have the following:

  1. 0.1
  2. 0.4
  3. 0.1
  4. 0.4

For a total time of ~1.0ms per read transaction at 44K.

I suspect the 0.4ms difference could be well within a margin of error of my hand based counting.  I took my numbers from a random sampling of 3 transactions, and averaged.  I cannot guarantee they were at the same spot of the boot process.

However, it appears the difference between them is close to negligible.  Question that must be posed is what’s the cost of a ‘retry’ or a missed or faulty UDP packet?  From the evidence I have it should be fairly small, but I haven’t figured out a way to test or detect what the turnaround time of a ‘retry’ is yet.

Citrix has a utility that gives you some information on what kind of gain you might get.  It’s called ‘Stream Console’ and it’s available in the Provisioning Services folder:

 

With 4K I/O burst it does not display any packets sent larger because they are limited to that size

 

8K I/O Burst Size. Notice how many 8K sectors are read over 4K?

 

16K I/O Burst Size

 

What I did to compare the differences in performance between all the I/O Burst Size options is I simply tried each size 3 times and took the results as posted by the StatusTray utility for boot time.  The unfortunate thing about the Status Tray is that it’s time/throughput calculations are rounded to the second.  This means that the Throughput isn’t entirely accurate as a second is a LARGE value when your talking about the difference between 8 to 9 seconds.  If you are just under or over whatever the rounding threshold is it’ll change your results when we start getting to these numbers.  But I’ll present my results anyways:

To me, the higher value of I/O Burst Size the better the performance.  

Again, caveats are that I do not know what the impact of a retry is, but if reading from the disk and resending the packet takes ~1ms then I imagine the ‘cost’ of a retry is very low, even with the larger sizes.  However, if your environment has longer disk reads, high latency, and a poor network with dropped or lost packets then it’s possible, I suppose, that higher I/O burst is not for you.

But I hope most PVS environments are something better designed and you actually don’t have to worry about it.  🙂

Read More

Lets Make PVS Target Device Booting Great Again (Part 1)

2016-12-30
/ /
in Blog
/

Some discussions have swirled recently about implementing VDI.  One of the challenges with VDI are things like slow boot times necessitating having machines pre-powered on, requiring a pool of machines sitting using server resources until a logon request comes in and more machines are powered on to meet the demand…  But what if your boot time is measured in the seconds?  Something so low you could keep the ‘pool’ of machines sitting on standby to 1 or 2 or even none!

I’m interested in investigating if this is possible.   I previously looked at this as a curiosity and achieved some good results:

 

However, that was a non-domain Server 2012 R2 fresh out of the box.  I tweaked my infrastructure a bit by storing the vDisk on a RAM Disk with Jumbo Frames (9k) to supercharge it somewhat.

Today, I’m going to investigate this again with PVS 7.12, UEFI, Windows 10, on a domain.  I’ll show how I investigated booting performance and see what we can do to improve it.

The first thing I’m going to do is install Windows 10, join it to the domain and create a vDisk.

Done.  Because I don’t have SCVMM setup on my home lab I had to muck my way to enabling UEFI HDD boot.  I went into the PVS folder (C:\ProgramData\Citrix\Provisioning Services) and copied out the BDMTemplate_uefi.vhd to my Hyper-V target Device folder

I then edited my Hyper-V Target Device (Gen2) and added the VHD:

I then mounted the VHD and modified the PVSBOOT.INI file so it pointed to my PVS server:

 

 

I then created my target device in the PVS console:

 

And Viola!  It Booted.

 

And out of the gate we are getting 8 second boot times.  At this point I don’t have it set with a RAM drive or anything so this is pretty stock, albeit on really fast hardware.  My throughput is crushing my previous speed record, so if I can reduce the amount of bytes read (it’s literally bytes read/time = throughput) I can improve the speed of my boot time.  On the flip side, I can try to increase my throughput but that’s a bit harder.

However, there are some tricks I can try.

I have Jumbo Frames enabled across my network.  At this stage I do not have them set but we can enable them to see if it helps.

To verify their operation I’m going to trace the boot operation from the PVS server using procmon:

We can clearly see the UDP packet size is capping out at 1464 bytes, making it 1464+ 8 byte UDP header + 20 byte IP header = 1492 bytes.  I enabled Jumbo Frames

Under Server Properties in the PVS console I adjusted the MTU to match the NIC:

 

You then need to restart the PVS services for it take effect.

I then made a new vDisk version and enabled Jumbo Frames in the OS of the target device.  I did a quick ping test to validate that Jumbo Frames are passing correctly.

I then did started procmon on the PVS server, set the target device to boot…

and…

 

1464 sized UDP packets.  A little smaller than the 9000 bytes or so it’s supposed to be.  Scrolling down a little futher, however, shows:

 

Notice the amount of UDP packets sent in the smaller frame size?

 

Approximately 24 packets until it gets a “Receive” notification to send the next batch of packets.  These 24 packets account for ~34,112 bytes of data per sequence.  Total time for each batch of packets is 4-6ms.

If we follow through to when the jumbo frames kick in we see the following:

This is a bit harder to read because the MIO (Multiple Input Output) kicks in here and so there are actually two threads executing the read operations as opposed to the single thread above.

Regardless, I think I’ve hit on a portion that is executing more-or-less sequentially.  The total amount of data being passed in these sequences is ~32,992 bytes but the time to execute on them is 1-2ms!  We have essentially doubled the performance of our latency on our hard disk.

So why is the data being sent like this?  Again, procmon brings some visibility here:

Each “UDP Receieve” packet is a validation that the data it received was good and instructs the Sream Process to read and send the next portion of the file on the disk.  If we move to the jumbo frame portion of the boot process we can see IO goes all over the place in size and where the reads are to occur:

So, again, jumbo frames are a big help here as all requests under 8K can be serviced in 1 packet, and there are usually MORE requests under 8K then above.  Fortunately, Procmon can give us some numbers to illustrate this.  I started and stopped the procmon trace for each run of a Network Boot with Jumbo Frames and without:

Standard MTU (1506)

 

Jumbo Frame MTU (9014)

 

The number we are really after is the 192.168.1.88:6905.  The total number of events are solidly in half with the number of sends about a 1/3 less!  It was fast enough that it was able to process double the amount of data in Bytes sent to the target device and bytes received from the target device!

Does this help our throughput?  Yes, it does:

 

“But Trentent!  That doesn’t show the massive gains you are spewing!  It’s only 4MB/s more in Through-put!”

And you are correct.  So why aren’t we seeing more gains?  The issue lies with how PVS boots.  It boots in two stages.  If you are familiar with PVS on Hyper-V from a year ago or more you are probably more aware of this issue.  Essentially, PVS breaks the boot into the first stage (bootloader stage) which starts in, essentially, a lower-performance mode (standard MTU).  Once the BNIStack loads it kicks into Jumbo Packet mode with the loading of the Synthetic NIC driver.  The benefits from Jumbo Frames doesn’t occur until this stage.  So when does Jumbo Frames kick in?  You can see it in Event Viewer.

From everything I see with Procmon, first stage boot ends on that first Ntfs event.  So out of the original 8 seconds, 4 is spent on first stage boot where Jumbo Packets are not enabled.  Everything after there is impacted (positively).  So for our 4 seconds “standard MTU” boot, bringing that down by a second is a 25% improvement!  Not small potatoes.

I intend to do more investigation into what I can do to improve boot performance for PVS target devices so stay tuned!  🙂

Read More
Page 1 of 512345