One of the things I’ve been prevaricating about with the new Hyper-V system at home is whether or not to join the host server to the virtual DC which in turn hosts the Active Directory forest and domain for the lab environment.
Running a virtual DC in either a production or a lab environment isn’t difficult, but there are a few gotchas. I wrote an in-depth piece about this on 4sysops.com, so I won’t repeat it all here. The only slight difference is that the other domain-joined virtual servers on the Hyper-V host have a start delay of 180 seconds. 120 should be sufficient, but I’m planning on some relatively complex setups so I’m just playing it safe.
Brien Posey over at VirtualizationAdmin.com wrote a very good series of articles about the various pitfalls of setting up an AD environment which includes virtual DCs. The series is well worth a read (particularly the bit where his entire lab environment gets fried by a lightning strike!) but one of the key things he mentions is the limitation of having a virtual domain sat on top of a Hyper-V host which isn’t part of that domain – namely, backup and management.
If the Hyper-V parent partition is excluded from the domain, backup programs like DPM are not able to protect that partition. Additionally, it makes remote management a lot more difficult to set up and as for using SCVMM – forget it. These are the show-stoppers for me, as many of the labs I’m planning revolve around the System Center suite of products, so it looked like there was no option but to get the parent partition on the domain.
Ben Armstrong (Virtual PC Guy) summarises this problem very well in this post, and in a later post details some very useful tips for streamlining the process so that if you have to restart the parent partition, you’re not waiting for ages before you can log in, wondering whether it’s just a waiting game or whether something is rotten in the state of hypervisor…
Ben’s solution is two-fold:
1 – Disable the use of cached credentials – cached credentials are nice things, but they can mask a serious problem. Basically, just because you’ve been able to log into the parent partition using AD credentials, doesn’t mean that the parent partition has actually authenticated to the DC. By disabling cached credentials, you’re ensuring that every successful logon attempt has only been successful because the DC has handled the attempt.
To do this, open REGEDIT on the parent partition and navigate to HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindows NTCurrentVersionWinlogon. Find the string called CachedLogonsCount and change the value to 0 (the default is 10).
2 – Force re-registration of the DC in DNS – The DNS service on the DC generally comes up after the AD services. Therefore, the DC hasn’t registered itself in DNS and no domain-joined machines can find it, hence no logon. by default, it will attempt to re-register itself in 10 minutes, but that’s another 10 minutes of inactivity, which is boooooring.
To get around this, Ben recommends forcing the DC to re-register itself in DNS on system startup by creating a batch file which does all the necessary work and running it as a Scheduled Task on the DC using the domain Administrator credentials. The batch file contains the following lines:
- ipconfig /flushdns
- ipconfig /registerdns
- nltest /dsregdns
Create an appropriate task in Scheduled Tasks and give it a test run.
From my own experience on the lab system, everything ran perfectly and after its first restart from joining the domain, I was able to log into the parent partition successfully within a couple of minutes of the OS firing up.
The big question is: is this architecture acceptable in a production environment?
My gut reaction is to say Yes. I understand that many IT pros are nervous with the idea of running all the DCs in a forest on virtual platforms, the fear being that should something go wrong, it’s much harder to manage your way to a resolution. My feeling is that even you have a DC running on physical hardware, you’re not really minimising the risk. If it’s all about risk mitigation that you can run up multiple DCs spread across different physical hosts, make use of a DR site or even spool something up on a cloud platform.
Also, it’s not really that difficult to manage a domain-joined hypervisor host if its domain is down – you just need console access and the local Administrator account, or in the case of VMware, the vSphere client and the root password. All of these disaster scenarios can be recovered from, and my feeling is that a fully virtual environment gives you more opportunity to protect the systems and thereby recover faster, not slower.
So – all is on the domain, so now we move onto remote management from my non-domain-joined workstation. Fun!