So, due to some unrelated disk locking issues (see https://baumaeam.wordpress.com/2015/09/22/unable-to-start-vms-failed-to-lock-the-file/), my vCenter VM failed to start today.
From the aforementioned blog post, the solution for any other VM besides vCenter would have been to powercycle the responsible host for the file lock (or all of them, for good measure), and then restart the affected VM. However, this is not possible if the vCenter VM is the victim, as you can’t really do vMotion without vCenter!
Regardless, I went down a long rabbit hole full of attempted fixes that ultimately required me to restore the vCenter VM from a Veeam backup directly to a host. It worked great, except I couldn’t vMotion the vCenter VM anymore! vSphere kept throwing the following error whenever I attempted to vMotion the VM:
“That’s odd”, I thought to myself. Maybe because the VM was registered directly on the ESXi host and then brought up, vCenter somehow see itself there? So when I tried to vMotion, it wouldn’t figure it out? Not sure. I figured a possible fix would be to shut down the vCenter server, open the c# client to the host that it was on, and unregister and reregister. Perhaps doing that would get the process right, and things would work.
… not so much.
I connected the client to the target host, and unregistered the VM, then reregistered it on a different host. After reregistering it, the network adapter for the vCenter server could no longer connect to the distributed switch. So the VM would come up, but vCenter couldn’t start because it didn’t have a network adapter to talk to the Platform Services Controller.
My solution was to create a new portgroup (with the appropriate VLAN) on an existing vSphere Standard Switch, steal a host NIC away from the LAG in the vDS, add it to the VSS, and then power up the vCenter VM. Once it came up, it was able to connect to the PSC, and get the vCenter Server process up. Then I moved it back to the vDS, and things seemed to work okay again!
Hope this helps anyone facing the same issue, where their vCenter server is unable to get going because of vDS inaccessibility.