Scenario
It’s a pleasant day and all is well with the world. Colleagues are skipping around the office with smiles on faces…until…duh duh daaa! One by one, services start failing:
- Printers go offline:
- First, for Win7 users
- Then for all clients
- Can still print from server though
- File shares go offline
- Active Directory replication fails
- DNS console will not open
Basically, your main Domain Controller (DC) has just taken a dump…and so have you!
These are the steps I took to troubleshoot the issues and get everything back online.
Solution
Gather Information
Run the following commands to gather useful information:
ipconfig /all > c:\ipconfig.txt (from each DC/DNS Server) dcdiag /v /c /d /e /s: > c:\dcdiag.txt dcdiag /test:dns /s: /DnsBasic > c:\dcdiag-dnsbasic.txt repadmin /showrepl dc* /verbose /all /intersite > c:\showrepl.txt (dc* is a placeholder for the starting name of the DCs if they all begin the same - if more then one DC exists) repadmin /replsum > c:\replsum.txt
Pour through the txt files and note down the errors. Some of mine included:
- repadmin /showrepl
- Last error: 1256 (0x4e8): The remote system is not available.
- Last error: 5 (0x5): Access is denied.
- WARNING: KCC could not add this REPLICA LINK due to error.
- result 1722 (0x6ba): The RPC server is unavailable.
- repadmin /replsum
- (1722) The RPC server is unavailable.
- (5) Access is denied.
- dcdiag /test:dns /s: /DnsBasic
- The host
could not be resolved to an IP address. Check the DNS server, DHCP,server name, etc. - Got error while checking LDAP and RPC connectivity. Please check your firewall settings.
- Error: No LDAP connectivity.
- invalid DNS server:
- No host records (A or AAAA) were found for this DC.
- Warning: no DNS RPC connectivity (error or non Microsoft DNS server is running).
- Name resolution is not functional.
- The host
- dcdiag /v /c /d /e /s:
- EventID: 0x40000004 – The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server.
- EventID: 0xC00004B2 – The DFS Replication service failed to contact domain controller to access configuration information.
- EventID: 0xC000138A – The DFS Replication service encountered an error communicating with partnerfor replication group Domain System Volume.
- The replication generated an error (-2146893022): The target principal name is incorrect.
- Error: Detected circular loop trying to locate the ISTG.
- repadmin /syncall
- -2146893022 (0x80090322): The target principal name is incorrect.
- SyncAll exited with fatal Win32 error: 8440 (0x20f8): The naming context specified for this replication operation is invalid.
Some information seemed to conflict as similar tests for certain services failed (like DNS) yet you could still ping by name and confirm using nslookup. Moving on.
Go through the errors one by one and search online for solutions. Here are some of the URLs I used to troubleshoot errors:
- RPC
http://social.technet.microsoft.com/wiki/contents/articles/4494.troubleshooting-the-rpc-server-is-unavailable.aspx - Active Directory Replication
http://technet.microsoft.com/en-us/library/bb727057.aspx - Troubleshooting AD Replication error 8453: “Replication access was denied.” http://support.microsoft.com/kb/2022387
By now things might seem to snowball, but stay calm and keep trying recommended steps from Microsoft, recording your steps along the way:
To stop the KDC
- At a command prompt, type the following command and press ENTER:
- net stop KDC
- If the KDC cannot stop, set its startup state to disable and restart.
To purge the ticket cache
- At a command prompt, type the following command and press ENTER:
- klist purge
- Answer Yes for each ticket
To reset the computer account password on the PDC emulator
- At a command prompt, type the following command and press ENTER:
- netdom resetpwd /server:/userd:\administrator /passwordd:*
Some other commands I used included:
dcdiag /test:CheckSecurityError /s dcdiag /testdomain: nltest /logon_query nltest /dclist: nltest /domain_trusts nltest /DSQUERYDNS nltest /DSREGDNS nltest /sc_verify: nltest /dsgetdc: /force net config rdr dsquery * forestroot -scope subtree -filter "(serviceprincipalname=)" -attr * -snltest /dsgetdc: /gc gave this error:
Getting DC name failed: Status = 1355 0x54b ERROR_NO_SUCH_DOMAINnltest /server: /sc_query: gave this error:
I_NetLogonControl failed: Status = 1355 0x54b ERROR_NO_SUCH_DOMAINKnow when to quit
My troubleshooting ran on to a second day. By now, users were using a workaround to access printers and file shares, but the DC errors continued. At this point, I decided to demote the DC and just leave it as a file and print server; which is best practice anyway.
After taking a snapshot of the DC (via VMware vCenter), I proceeded to go through the standard steps to demote a DC:
- Transfer all FSMO roles to another DC – this failed with a generic error (http://social.technet.microsoft.com/Forums/en/winserverDS/thread/3f49ddbc-c948-43ac-af21-2f5a4f3dce9b).
- Run dcpromo to demote DC – this also failed.
Great. Now the only option was a forceful removal of the DC (http://technet.microsoft.com/en-us/library/cc731871(v=ws.10).aspx). I
dcpromo /forceremoval worked fine. I then removed the DC from Sites and Services, at which point the FSMO roles were transferred to another DC, so I didn’t need to seize them. You used to have to go through a Metadata Cleanup, after forcing a demotion, but now this is done for you when you remove the DC from Sites and Services. This can be confirmed by following the steps here: http://www.petri.co.il/delete_failed_dcs_from_ad.htm
Although this is much easier using 2008 R2, you will still need to tidy up a little in other areas:
- Remove all entries of failed DC in Name Server Tabs on all relevant DNS zone properties.
- Backup and restore DHCP database to another server.
- Tombstone WINs entries from failed DC:
- From another DC, go to WINS >Active Registrations > right-click > Delete Owner.
- Select failed DC.
- Replicate deletion to other servers (tombstone).
- The new DC will then take ownership of the records.
- Uninstall above roles from failed DC.
- Update DHCP and devices with static IPs to use the new DC’s IP Address for DNS and WINS. You did spin up a new DC right?!?!
Another great tip I found was from this thread on Spiceworks:
If we really want to be safe then open a command prompt with elevated privileges and run the following command
csvde –f C:\\ad_details.csv
This exports all contents of ASDIEdit to an excel file in the root of C drive called “ad_details.csv” Open this in Excel and do a find all for. If it finds any references then we have lingering objects and will need to perform a Metadata Cleanup. Conclusion
Although this was a nightmare to troubleshoot – and I have a chip on my shoulder as I didn’t find the root-cause or fix the DC – I have more confidence in the steps to force the removal of a screwed up DC. Next time I’ll learn to let go a little faster.
Update: I’ve just found more notes on this that may be useful in future:
- Error Message: Logon Failure: The Target Account Name Is Incorrect: http://support.
microsoft.com/?id=310340 - “Logon failure: the target account name is incorrect” error when promoting domain controllers or creating replicas: http://support.microsoft.com/?
id=296993 - Active Directory Replication and Knowledge Consistency Checker Fail without Trusted Domain Object: http://support.microsoft.com/?
id=257844 - Error Message “Target Principal Name is Incorrect” When Manually Replicating Data Between Domain Controllers: http://support.microsoft.com/?
id=288167 - Troubleshooting AD Replication error 1396: Logon Failure: The target account name is incorrect: http://support.microsoft.com/
kb/2183411 - repadmin /syncall /AdeP
- /A Synchronizes all naming contexts that are held on the home server.
- /d Identifies servers by distinguished name in messages.
- /e Synchronizes domain controllers across all sites in the enterprise. By default, this command does not synchronize domain controllers in other sites.
- /P Pushes changes outward from the specified domain controller.
- A missing service principal name may prevent domain controllers from replicating: http://support.microsoft.com/
default.aspx?scid=kb;en-us; Q308111 - http://social.technet.
microsoft.com/Forums/en/ winserverDS/thread/3f49ddbc- c948-43ac-af21-2f5a4f3dce9b
I’m on my second day on this endless quest to try to fix one DC that won’t replicate and has so many errors on it, but I’m at the point that it’s time for it go. Sadly this error seemed that it started with an a W32time that was not taken care of for over 1 year by the previous IT guy…the pains of Domain Controllers Arghhh!!
I feel your pain. I’ve seen terrible problems off the back of time-sync issues. It’s so important to have all servers in sync. If not, the strangest things can happen.