Debugging Vulkan driver crash - equivalent of NVIDIA Aftermath
Adam Sawicki
Posted on March 28, 2018
New generation, explcit graphics APIs (Vulkan and DirectX 12) are more efficient, involve less CPU overhead. Part of it is that they don't check most errors. In old APIs (Direct3D 9, OpenGL) every function call was validated internally, returned success of failure code, while driver crash indicated a bug in driver code. New APIs, on the other hand, rely on developer doing the right thing. Of course some functions still return error code (especially ones that allocate memory or create some resource), but those that record commands into a command buffer just return void
. If you do something illegal, you can expect undefined behavior. You can use Validation Layers / Debug Layer to do some checks, but otherwise everything may work fine on some GPUs, you may get incorrect result, or you may experience driver crash or timeout (called "TDR"). Good thing is that (contrary to old Windows XP), crash inside graphics driver doesn't cause "blue screen of death" or machine restart. System just restarts graphics hardware and driver, while your program received VK_ERROR_DEVICE_LOST
code from one of functions like vkQueueSubmit
. Unfortunately, you then don't know which specific draw call or other command caused the crash.
NVIDIA proposed solution for that: they created NVIDIA Aftermath library. It lets you (among other things) record commands that write custom "marker" data to a buffer that survives driver crash, so you can later read it and see which command was successfully executed last. Unfortunately, this library works only with NVIDIA graphics cards and only in D3D11 and D3D12.
I was looking for similar solution for Vulkan. When I saw that Vulkan can "import" external memory, I thought that maybe I could use function vkCmdFillBuffer
to write immediate value to such buffer and this way implement the same logic. I then started experimenting with extensions: VK_KHR_get_physical_device_properties_2, VK_KHR_external_memory_capabilities, VK_KHR_external_memory, VK_KHR_external_memory_win32, VK_KHR_dedicated_allocation. I was basically trying to somehow allocate a piece of system memory and import it to Vulkan to write to it as Vulkan buffer. I tried many things: CreateFileMapping
+ MapViewOfFile
, HeapCreate
+ HeapAlloc
and other ways, with various flags, but nothing worked for me. I also couldn't find any description or sample code of how these extensions could be used in Windows to import some system memory as Vulkan buffer.
Everything changed when I learned that creating normal device memory and buffer inside Vulkan is enough! It survives driver crash, so its content can be read later via mapped pointer. No extensions required. I don't think this is guaranteed by specification, but it seems to work on both AMD and NVIDIA cards. So my current solution to write makers that survive driver crash in Vulkan is:
- Call
vkAllocateMemory
to allocateVkDeviceMemory
from memory type that hasHOST_VISIBLE + HOST_COHERENT
flags. (This is system RAM. Spec guarantees that you can always find such type.) - Map the memory using
vkMapMemory
to get raw CPU pointer to its data. - Call
vkCreateBuffer
to createVkBuffer
withVK_BUFFER_USAGE_TRANSFER_DST_BIT
and bind it to that memory usingvkBindBufferMemory
. - While recording commands to
VkCommandBuffer
, usevkCmdFillBuffer
to write immediate data with your custom "markers" to the buffer. - If everything goes right, don't forget to
vkDestroyBuffer
andvkFreeMemory
during shutdown. - If you experience driver crash (receive
VK_ERROR_DEVICE_LOST
), read data under the pointer to see what marker values were successfully written last and deduce which one of your commands might cause the crash.
There is also a new extension available on latest AMD drivers: VK_AMD_buffer_marker. It adds just one function: vkCmdWriteBufferMarkerAMD
. It works similar to beforementioned vkCmdFillBuffer
, but it adds two good things that let you write your marker with much better granularity:
- It can be called both inside and outside render pass, while
vkCmdFillBuffer
must be called outside render pass. - It performs its write after specified pipeline stage finished executing.
I created a simple library that implements all this logic under easy interface. All you need to use it is just this single file: VulkanAfterCrash.h.
Posted on March 28, 2018
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.