Category Archives: Vulkan

Vulkan – Draw Call Instancing

Purpose of the Sample:

This sample demonstrates how to do several important things using the Vulkan rendering API.  The sample shows how to create a logical device based on a physical device (IE. GPU) that supports certain features.  You will also see how to create a vertex buffer, bind SPIR-V shaders, and of course, make instanced draw calls.  By the way, instancing is a first class citizen in Vulkan!  Please note, I derived this sample based on my learnings from the online guide.  Below is the end result.


The Vulkan Declaration of Independence:

Many folks have heard of the following famous line in the US declaration of independence.

We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.

Folks generally make the mistake of assuming that writing a game using the Vulkan rendering API will make your game more performant.  However, this is not the promise of modern rendering APIs like Vulkan and DirectX 12.  Using a modern rendering API only offers you the pursuit of performance happiness.  For many years, game developers, have complained that rendering APIs like OpenGL are not performant enough.  Game Developers thought they could do a better job.  Therefore GPU driver developers decided to troll game developers by forcing them to become GPU driver developers.  You do have a lot more control and insight into the interaction between the CPU and GPU.  However, that comes at a giant cost of complexity.  When you decide to engage a modern rendering API like Vulkan or DirectX 12, you are no longer just writing rendering code.  You are writing graphics drivers.  You must explicitly manage synchronization between the CPU and GPU.  You must explicitly setup most of the rendering states.  You will need to enable validation layers if you want Vulkan to tell you about rendering issues. You will need to understand the details of and manage how data is uploaded from the CPU to the GPU.  In summary, you will need to understand, in more detail, how a GPU works.  Also,  keep in mind that the GPU driver developers have had over 20 years to come up with slick optimization strategies for OpenGL and DirectX whereas game developers are now reinventing the wheel.  Therefore your mileage may vary when it comes to comparing the performance difference between OpenGL and Vulkan.

Vulkan Keywords:

There are a few keywords you need to be familiar with before diving head first into Vulkan.  I will list them below.

  • Physical Device – This is the physical integrated (built into the main processor) or discrete (dedicated) GPU you will use.
  • Logical Device – Most of your interaction with the Vulkan API will happen using a logical device.  A Logical device represents a physical device with an explicit set of enabled features and extensions.   OpenGL would automatically/implicitly enable extensions whereas Vulkan requires that we explicitly enable them via the logical device.  The logical device allows us to do typical rendering tasks like creating vertex buffers, upload image data, record commands like draw calls or dispatch compute jobs.
  • Queue Family – You must submit work to queues in order to perform operations on the GPU hardware.  The operations in a single queue are processed in the order they were submitted.  However, you can have multiple queues; each with its own set of operations.  Each queue is processed independently.  If needed, the queues can be synchronized with each other.  There are several different kinds of queues.  Each type of queue can represent a different part of the GPU hardware and support different kinds of work.  Queues with the same capabilities are grouped into families.   For example, you could have a GPU that has two graphics queues and one compute queue.  This would be that you could divide your rasterization work into two independent queues and batch all your compute work together and execute all the work in parallel.
  • Physical Device Properties – Contains general information about the GPU like its name, version of the driver, supported version of the Vulkan API.  We can also use properties to determine if the GPU is integrated or discrete or even a CPU itself.  You can use properties to also learn about specific limitations of the hardware like the max size of textures that can be created on it or how many buffers can be used in shaders.
  • Physical Device Features – Additional Features that may be supported by the given hardware, but not required by the core Vulkan specification.  Features include items such as geometry and tesselation shaders, depth clamp, and bias, multiple viewports, or wide lines.
  • Descriptor – A resource uploaded from the CPU to the GPU.  These could be things like vertex buffers.
  • Descriptor Set – These are bound to a pipeline to specify an interface between the application and the shaders.  A descriptor set allows resources to be grouped together into a set hence the name.  This is similar to the idea of an OpenGL uniform buffer where you could store your model, view, projection matrix for use with a shader.  You fill out all the data on the CPU and upload it to the GPU in a single batch.
  • Descriptor Set Layout – These are used to inform the hardware as to what resources are grouped in a set, how many resources of each type there are, what their order is, etc.
  • Render Pass – A render pass is an explicit core part of the Vulkan API.  A render pass owns one or more “sub-passes”.  The sub passes own a framebuffer with can have multiple attachments like a color buffer and a depth buffer.  The sub passes can then reference the output of a prior sub pass’s framebuffer.  For example, you can have two sub-passes where the first renders the scene into a frame buffer with a color and depth attachment.  The second sub pass can then reference the color attachment in the first pass to render into only the color buffer into the swap chain.
  • Graphics Pipeline State – This is a data structure that will make you question if you are a graphics programmer or an accountant filing taxes.  It is a huge data structure that needs to be explicitly filled out.  The GPU will use this to determine every last detail regarding how to render the frame.   In OpenGL land, most of the render state is filled in with default values.  You just explicitly change the states of things you are interested in.  With Vulkan, you need to fill all of it out by hand.  This includes things like the various shader stages you want to use (Vertex, Geometry, Fragment, etc), render passes you want to use, line widths, viewports sizes, blending, etc.
  • Command Buffers – OpenGL and DirectX 11 both use a graphics context to allow you to issue commands to the GPU.  Vulkan does not offer you a context.  Instead, you use to fill a command buffer with commands and flush the commands into a queue which are then executed on the GPU.
  • Validation Layers – The OpenGL driver implementation will normally validate any API calls you make to ensure they are reasonable and prevent crashing your system.  However, validation takes time which costs performance.  Therefore Vulkan doesn’t do any validation for you unless to explicitly enable validation checks.  Furthermore, you can tell Vulkan what kinds of validations you want it to do to minimize the performance impact.  This is done by explicitly enabling what are called “validation layers”.

GPU Memory Concepts

Contrary to OpenGL and DirectX 11, there is a little more work you need to do to upload data from the CPU to the GPU when using Vulkan.  GPU memory is partitioned into two types of memory, Host Visible and Device Local GPU memory.  Host visible memory is memory that exists on the GPU which is visible to the CPU.  Therefore the CPU can copy system memory into this GPU host visible memory.  For example, you will need to upload a vertex buffer from the CPU’s main memory into the GPU’s host visible memory.  GPU host visible memory is slower to use than the device local memory but it allows you to map memory from the CPU to the GPU.  Device local memory is intended to be memory that is fast to access for memory operations that need to be fast like texture sampling.  Device local memory is not touchable by the CPU.  Therefore to upload texture data to device local memory, you must first upload the data from the CPU to the GPU’s host visible memory.  From there, you must upload the data in the host GPU’s host visible memory into the GPU’s local device memory.  Graphics engineers normally call the upload from CPU memory to the GPU’s host visible memory the “staging buffer”.

How Do You Draw Many Cubes With a Single Draw Call?

Via instancing!  Any time you send data from the CPU to the GPU, you pay a performance cost.  Therefore it is better to send batches of data from the CPU to the GPU.  With instancing, you can issue a single draw call and request the GPU execute the draw call work multiple times.  In this sample, we will render multiple cubes at different locations.  Therefore the main difference between the cubes will be their world position.  The camera’s transform and the projection matrix will remain the same for all the cubes.  Knowing this, we can upload a buffer with 30 unique object to world matrices.  The GPU will notify the vertex shader of which instance, via an instance id, of the cube is currently being rendered.  Therefore we can use the instance id to index the object to world matrices to uniquely position each cube.








vertex.vert (Vertex Shader)

fragment.frag (Fragment Shader)

Download the Sample Project

Useful Links: