Question 1

What causes a VMware ESXi purple screen (PSOD)?

Accepted Answer

Almost always a bad or mismatched driver. Failing hardware does it too, anything from a dying memory stick to a flaky PCIe card, and so does storage heap exhaustion. What it usually is not is ESXi itself. Read the exception on the screen first: a #PF Exception 14 is pointing at a driver, a LINT1 or NMI is pointing at hardware. Set up a coredump target, then update the driver and firmware it blamed, checking them against the VMware HCL.

Question 2

How do I fix an ESXi host showing as Not Responding in vCenter?

Accepted Answer

Start by bouncing the management agents from the host shell: /etc/init.d/hostd restart and /etc/init.d/vpxa restart. Then make sure the management network is actually up and that vCenter can reach the host on ports 902 and 443. Now right-click the host and hit Reconnect. If the agents flat-out will not start, the usual culprit is disk space on the host scratch, so check that and the hostd.log.

Question 3

What is the difference between APD and PDL?

Accepted Answer

APD (All Paths Down) is temporary. The device might come back, so ESXi just keeps retrying and hoping. PDL (Permanent Device Loss) is the array telling you straight up, via SCSI sense codes, that the device is gone for good. The fixes do not overlap either. APD is a fabric or array thing you bring back online. PDL means you pull the dead device and then remediate whatever datastores and VMs it took down with it.

Question 4

How do I power on a VM that says the file is locked?

Accepted Answer

Something else is holding a lock on the VMDK or the .vmx, either another host or a stale process that never let go. Run vmkfstools -D on the locked file and read the MAC in the output, that tells you which host owns it. Make sure the VM really is not running over there. Then clear the stale lock by restarting the management agents on the host that is holding it, or reboot that host if it comes to that. One rule though: never just delete .lck files because you are in a hurry.

Question 5

Why can't I delete or consolidate a snapshot?

Accepted Answer

Usually one of two things. Either the datastore does not have enough free space for the merge, or a backup job still has an open file handle on the delta disk. So free up space, at minimum the size of the whole delta chain, kill any backup that is touching the VM, then run Snapshot Consolidate. If it is still stuck, vmware.log in the VM folder names the exact file the merge is choking on.

VMware ESXi Error Code Reference

How to use this VMware ESXi error reference

The logs and commands that solve most VMware issues

PSOD, APD and PDL: the families that trip people up

Privacy and how this tool runs

Frequently asked questions