I encountered an interesting issue and worked through a solution with a corrupt registry. The issue seemed innocuously enough, we upgraded PowerShell to 5.1. Upon reboot I encountered a bluescreen with code 0xF4:
I have encountered this issue before, but I haven’t recorded my troubleshooting steps until now.
Since this was a PVS target device, the easy method was deleting the version and trying to upgrade PowerShell to 5.1, which resulted in the same BSOD. So it was easily reproducible. So I tried it a few more times, because, why not?
I then deleted this version, and booted into the system and looked at Event Viewer for hints of what could be at fault. Going through it was pretty obvious that it was registry corruption:
Filtering for EventID 5 shows all the attempts of booting to BSOD:
I mean, it literally says the Registry was corrupted 🙂
Where can you find this corruption? With PVS it’s fairly simple but I believe the same process can exist for other systems including physical. The first step was to mount the vDisk to a VM.
Mount the registry hive you suspect with corruption.
Next is to scan the registry for corruption. Thus far, I’ve only found corruption to be detectable if it’s a key or value that cannot be read. If the data on a value can be read but contains garbage it’s much harder to detect. In order to avoid permissions being a problem, I open a PowerShell prompt as SYSTEM using PSEXEC. If you don’t elevate permissions, some keys maybe restricted from the Admins group and this will be detected as a failure.
Once at this stage, it’s a one-liner to scan the registry:
ls -Recurse -ErrorAction Stop | Out-File C:\RegCheck.txt
In my experience, corruption can be detected as “Permission Denied”, “Access Denied”,”Path does not exist” or some such:
At this point you can examine the text file to see the last path it explored:
At this point you can open regedit (as SYSTEM) and examine the keys within that path. Clicking through each one revealed the corrupted key:
Attempting to get Permissions on this key reveals it also exists on the ACL level:
“The requested security information is either unavailable or can’t be displayed”.
Deleting the key may fail as well:
At this point you need to evaluate how to manage the corruption. If you cannot delete the key, rename it, or in some way replace it you may have an option like I did… You can rename a higher up branch in the tree, go to a existing system with the same keys (with PVS I can go to the previous version and export that tree) and reimport.
Unload the hive and boot up the system –> and you may have a fully working system!
I’ve used this trick here, and on a corrupt COMPONENTS hive in the past. With the COMPONENTS hive I got lucky I could replace the corrupted keys with ones from a branched vDisk. Other machines didn’t have the same key in COMPONENTS so I got lucky.