Touching on my last point with an invalid AD Home Directory attribute, I decided to examine it in more detail as what is causing the slowness on logon.
52 seconds for Profile Load time. Â This is caused by this:
The Home Folder server ‘WSAPVSEQ07’ I created that share and set the home directory to it. Â I then simulated a ‘server migration’ by shutting this box down. Â This means that the server responds to pings because it’s still present in DNS.
But why does it take so long?
If I do a packet capture on the server I’m trying to launch my application from, and set it to trace the ip.addr of the ‘powered off’ server, here’s what we see:
The logon process attempts to query the server at the ‘28.9’ second mark and stops at the ‘73.9’ second mark. Â A total of 45 seconds. Â This is all lumped into the ‘User Profile Load Time’. Â So what’s happening here?
We can see the initial attempt to connection occurs at 28.9 seconds, then 31.9 seconds, then finally 37.9 seconds. Â The time span is 3s between the first and second try then 6s between the second and third try. Â This is explained here.
In Windows 7 and Windows Server 2008 R2, the TCP maximum SYN retransmission value is set to 2, and is not configurable. Because of the 3-second limit of the initial time-out value, the TCP three-way handshake is limited to a 21-second timeframe (3 seconds + 2*3 seconds + 4*3 seconds = 21 seconds).
Is this what we are seeing? Â It’s close. Â We are seeing the first two items for sure, but then in instead of a ‘3rd’ attempt, it starts over but at the same formula.
1 2 3 4 5 6 7 8 |
Packet 8451 to 8508 = 3 seconds Packet 8508 to 8600 = 6 seconds (2*3) Packet 8600 to 8910 = 12 seconds (4*3) Packet 8910 to 8979 = 3 seconds Packet 8979 to 9048 = 6 seconds (2*3) Packet 9048 to 9240 = 6 seconds (2*3?) Packet 9240 to 9295 = 3 seconds Packet 9295 to 9413 = 6 seconds (2*3) |
= 45 seconds.
According to the KB article, we are seeing the Max SYN retransmissions (2) for each syn sent. Â This article contains a hotfix we can install to change the value of the Max SYN retransmissions, but it’s a minimum of 2 which it’s set to anyways. Â However, there is an additional hotfix to modify the 3 second time period.
The minimum I’ve found is we can reduce the 3 second time period to 100ms.
This reduces the logon time to:
19 seconds.
What does this packet capture look like? Â Like this:
1 2 3 4 5 6 7 8 9 |
With 100ms Initial RTO we see the difference between the packets to be: Packet to = 300ms Packet 18743 to = (2*3) Packet 19316 to = 12 (4*3?) Packet to = 300ms Packet to = (2*3) Packet to = seconds (4*3?) Packet 39651 to = 300ms Packet 39693 to = 600ms (2*3) |
Even with the ‘Initial RTO’ set to 100ms, Windows has a MinRTO value of 300ms:
After the initial ‘attempt’ there is a 10-12 second delay.
Setting the MinRTO to the minimum 20ms
Reduces our logon time further now because our SYN packets are now about 200ms apart:
We are now 16 seconds, 13 seconds spent on the profile upon which 12 seconds was timing out this connection.
Would you implement this? Â I would strongly recommend against it. Â SYN’s were designed so blips in the network can be overcome. Â Unfortunately, I know of no way to get around the ‘Home Directory responds to DNS but not to ping’ timeout. Â The following group policies have no effect:
And I suspect the reason it has no effect is because the server still responds to DNS so the SYN sequence takes place and blocks regardless of this GPO settings. Â Undoubtedly, it’s because this comes into play:
Since our user has a home directory the preference is to ‘always wait for the network to be initialized before logging the user on’. Â There is no override.
Is there a way to detect a dead home directory server during a logon? Â Outside of long logon’s I don’t see any event logging or anything that would point us to determine if this is the issue without having to do a packet capture in a isolated environment.