VMware vSphere / ESXi error reference · with fixes
Got an ESXi or vCenter error and no idea where to start? Paste in the code or the message, even a half-remembered symptom. You get back the cause and the exact steps to fix it: the esxcli or PowerCLI command to run, plus which log is worth reading or which service to bounce. Click any error to open the full resolution, plus a way to check it actually worked. There’s a category filter as well, handy when you’d rather browse everything storage, or just the PSOD entries. It all runs in your browser, nothing leaves the page.
How to use this VMware ESXi error reference
VMware breaks in a dozen different ways. A purple diagnostic screen (PSOD) on the host. A red banner in the vSphere Client. Some cryptic esxcli return that tells you almost nothing. Or just a quiet Host not responding sitting in vCenter. Here’s the annoying part: the same root problem shows up worded completely differently depending on where you happen to catch it. So this reference pulls the common ones together across ESXi, vCenter and vSphere, and for each one it gives you the actual cause and a numbered fix you can run right away. Drop any piece of the message into the search box, or filter by category, then open the card.
Honestly, most vSphere trouble sorts itself into a handful of families. A host crash (PSOD). Boot and install failures. The host losing its link to vCenter. Storage, which is its own whole world: APD, PDL, datastore locks, VMFS. Then networking, the migration and availability stuff (vMotion, HA, DRS), snapshots, VMs that won’t power on, and VMware Tools acting up. Figure out the family and you’ve usually figured out both the fix and which log to go open.
The logs and commands that solve most VMware issues
- vmkernel.log (
/var/log/vmkernel.log) is your first stop for host, driver and storage events, and anything near a PSOD. - hostd.log and vpxa.log when the management agent or the link to vCenter is the thing that broke.
- vmware.log lives inside each VM folder on the datastore. That’s where the per-VM power-on and disk errors hide.
- Need to restart the management agents? It’s safe from the host shell:
/etc/init.d/hostd restartand/etc/init.d/vpxa restart. - Check storage paths with
esxcli storage core path list, adapters withesxcli network nic list.
PSOD: reading a purple screen
A Purple Screen of Death stops the ESXi host cold and prints the exception, the module that failed, and a backtrace. The exception type is the part that tells you the most. A #PF Exception 14 (page fault) is almost always a driver. A hardware LINT1/NMI means memory or a dying component. A VMFS heap message? Storage ran out of room. First thing: configure a coredump target, a dump collector or just a local partition, so the host actually writes a dump you can dig into next time. Then go after whichever driver or firmware caused it, or the failing part itself. And look, in my experience the vast majority of these are driver or hardware faults. Not VMware itself, even though that’s where everyone points first.
Storage: APD versus PDL
Two storage states trip people up more than anything else. APD (All Paths Down) is a temporary loss of every path to a device, where the array might still come back, so ESXi just waits and keeps retrying. PDL (Permanent Device Loss) is different: it’s the array flat-out telling ESXi the device is gone for good, and you can spot it by the specific SCSI sense codes. The fix isn’t the same either. APD is usually some fabric or zoning or array-controller problem you need to bring back. PDL means the device is dead, so you pull it and remediate the datastore. Mix these two up and you’ll burn hours chasing the wrong thing. I’ve done it.
Privacy and how this tool runs
The whole error dataset sits right inside the page, and the search runs locally with plain JavaScript. Nothing you type is sent or kept anywhere. Load it once and it works with no connection at all, which is handy when the host that died was also your gateway.
Frequently asked questions
What causes a VMware ESXi purple screen (PSOD)?
Almost always a bad or mismatched driver. Failing hardware does it too, anything from a dying memory stick to a flaky PCIe card, and so does storage heap exhaustion. What it usually isn’t is ESXi itself. Read the exception on the screen first: a #PF Exception 14 is pointing at a driver, a LINT1 or NMI is pointing at hardware. Set up a coredump target, then update the driver and firmware it blamed, checking them against the VMware HCL.
How do I fix an ESXi host showing as Not Responding in vCenter?
Start by bouncing the management agents from the host shell: /etc/init.d/hostd restart and /etc/init.d/vpxa restart. Then make sure the management network is actually up and that vCenter can reach the host on ports 902 and 443. Now right-click the host and hit Reconnect. If the agents flat-out won’t start, the usual culprit is disk space on the host scratch, so check that and the hostd.log.
What is the difference between APD and PDL?
APD (All Paths Down) is temporary. The device might come back, so ESXi just keeps retrying and hoping. PDL (Permanent Device Loss) is the array telling you straight up, via SCSI sense codes, that the device is gone for good. The fixes don’t overlap either. APD is a fabric or array thing you bring back online. PDL means you pull the dead device and then remediate whatever datastores and VMs it took down with it.
Why does vMotion fail with a CPU incompatibility error?
The two hosts expose different CPU instruction sets, so moving a live VM between them risks a crash and vMotion won’t let you. The fix is Enhanced vMotion Compatibility (EVC). Turn it on at the cluster level, set to the highest baseline both CPUs can handle, and it masks off the newer features so both sides look the same. One catch: a VM that was already running before you set EVC has to be powered off once to actually pick up the new mode.
How do I power on a VM that says the file is locked?
Something else is holding a lock on the VMDK or the .vmx, either another host or a stale process that never let go. Run vmkfstools -D on the locked file and read the MAC in the output, that tells you which host owns it. Make sure the VM really isn’t running over there. Then clear the stale lock by restarting the management agents on the host that’s holding it, or reboot that host if it comes to that. One rule though: never just delete .lck files because you’re in a hurry.
Why can’t I delete or consolidate a snapshot?
Usually one of two things. Either the datastore doesn’t have enough free space for the merge, or a backup job still has an open file handle on the delta disk. So free up space, at minimum the size of the whole delta chain, kill any backup that’s touching the VM, then run Snapshot > Consolidate. If it’s still stuck, vmware.log in the VM folder names the exact file the merge is choking on.













