Skrevet av Emne: VMware ESX3.5/Virtual Center: Unable to reconnect a node to the cluster  (Lest 4640 ganger)

ATC

  • Gjest
One of two ESX nodes disconnected itself from the Virtual Center. When attempting to reconnect, we were prompted for the root password but the connection failed with an error message indicating a communication or password failure.

This caused the reconnect to work partially but it still failed at 80%

/var/log/messages on the node showed error messages like these:
Apr  3 11:57:04 vmesx02 modprobe: modprobe: Can't locate module char-major-14
Apr  3 11:57:04 vmesx02 modprobe: modprobe: Can't locate module char-major-14
Apr  3 11:57:04 vmesx02 modprobe: modprobe: Can't locate module block-major-2
Apr  3 11:57:04 vmesx02 last message repeated 6 times
Apr  3 11:57:04 vmesx02 vmware-authd(pam_unix)[28974]: authentication failure; logname= uid=0 euid=0 tty= ruser= rhost=  user=root

This would indicate some kind of problem with the authentication services. The password was verified, the PAM setup checked against the properly working node, the vmware-vpxa service stopped, the mgmt-vmware and vmware-vmkauthd services restarted, and we even tried "rpm -e VMware-vpxa" and deleting the vpxuser account to completely reset the vpxa service.



ATC

  • Gjest
Using the virtual center client to connect directly to the ESX node in question, we noticed that a newly installed VM was listed as "invalid". It turned out that the corresponding .vmdk had been copied from a VMware Workstation without proper conversion.

As soon as this "invalid" VM was deleted from the node, we connected to the Virtual Center and successfully reconnected the ESX node to the cluster.