Occlusion Culling

2025

Specifications

Language: C++

Time: 6 Weeks (50%)

Introduction

Occlusion Culling was developed as part of the Specialization course at The Game Assembly. The technique was implemented in our custom engine R.O.S.E. with the goal of improving performance by culling objects that are behind other objects.

Table Of Contents

Description

During the time of our seventh game project at The Game Assembly, we were tasked with finding an area we desired to specialize in by planning and exploring it independently.

I decided to explore Occlusion Culling because it falls directly within my interests of optimization and engine programming. I was intrigued by the idea of culling objects that are behind other objects and how it could help improve the performance for your game when there is simply just a lot of stuff to render. Especially in our case, since our seventh game project takes place within an apartment complex.

The implementation of the technique is based on an article by Nick Darnell on the subject, https://www.nickdarnell.com/hierarchical-z-buffer-occlusion-culling/. I furthermore added batching of occluders, light culling and tested it additionally for culling meshes during the shadow pass for the directional light.

To explain it briefly, the idea is to draw all occluders to a depth buffer that is then downsampled in a number of MipMap levels. This downsampled depth buffer is then used in a compute shader to determine whether the bounding box for a mesh can be seen or not.

Implementation

In-Engine Systems

The first task for me was to figure out how exactly each step would work within our current engine's rendering pipeline. We use double buffered rendering, where we have a thread for all update logic, while rendering runs on the main thread (because of DirectX11 Context). This means that occlusion culling cannot execute in the update thread because it needs access to rendering occluders, downsampling a texture, and dispatching compute shader. Therefore, I created a system in the update thread called OcclusionAssembler that handles queuing up the needed graphic commands for later execution on the main thread.

For a mesh to be culled, it must first add itself as an occlusion candidate through the OcclusionAssembler. By sending its bounding box, it will receive an identifier that is saved and can later be used to check if it was culled. The ID is called OcclCandID which simply just contains an index, as well as what occlusion pass it was culled in. Occlusion pass is just an incrementing index to identify which pass it belongs to, i.e., from which camera something was culled from if we have multiple. This index is reset at the start of every frame.

To summarize the pipeline:

Update Thread
- Run is called in the OcclusionAssembler by providing a camera and other data.
  - Enqueues rendering of all occluders in view of camera.
  - Enqueues the downsampling step.
  - Queries potentially visible meshes and adds them as candidates.
  - Finally, enqueues execution of occlusion cull by sending its candidates and occlusion pass index.
  - Increments occlusion pass index.
Main Thread
- Renders all occluders to a depth texture.
- Downsamples the texture in a number of MIP levels.
- OcclusionInterface receives all candidates in a list of bounding boxes as well as occlusion pass index.
- OcclusionInterface sends the downsampled texture and list of bounding boxes to the Compute Shader.
- OcclusionInterface also sets a read-write structured buffer for the Compute Shader to write its results to.
- OcclusionInterface executes the compute shader by calling Dispatch, the results are then read from the read-write buffer and copying the data to a different buffer.
- For the upcoming draw calls, by using OcclCandID they can retrieve whether they were culled or not.

Render Occluder Depth

As the article describes, the first part of the implementation is to identify and render all occluders to a depth buffer. An occluder is defined as the object that can block visibility to other objects in the world. Technically, all objects can block visibility, however, to prevent rendering all objects, only a subset is selected for being processed. These objects are typically large and ideally simple in complexity, for example, walls, floor, and roof. Due to wanting to avoid, if possible, rendering high-poly meshes, I decided in our engine that these occluders must be hand-made by the graphics artist. For example, in the game Shattered all the walls, floor, and roof use a basic scaled cube as occluder.

What the developer has to do to add an occluder in Unreal that gets imported into our engine, is to name their occluder mesh with the prefix [OC_], and add our custom component BP_Occluder to the actor. The component will find the occluder, take its mesh for later rendering, and then hides it.

The left image the occluders are hidden, and the right image the occluders (light blue) are shown.

These occluders are identified when starting an occlusion pass, and rendered to a depth buffer shown below, which we call the Hi-Z map. The depth buffer size is chosen depending on effectiveness and efficiency, the article went for 512×256, while I chose 1024×1024 since hardware has gotten a lot faster since 2010.

Depth Hierarchy / Downsampling

When the Hi-Z map has been created, its contents are then copied over onto a render target proxy. This render target contains MIP levels all the way down to the size 1×1.

The reason for downsampling a render target instead of a depth buffer is because the depth buffer has some odd issues when being written to through a pixel shader. For example, I tested doing so, and it worked for about 3 MIP levels, then it broke for no clear reason. My guess is that MIP levels for depth buffers are not fully supported.

The render target is then downsampled by running a fullscreen effect with a pixel shader that takes the previous MIP level and writes the highest depth value in a sample group of 4 pixels. By taking the maximum of the pixels, we remain conservative and avoid any potential pop-in.

This process continues for all levels, where usually the last level just contains the maximum depth.

The purpose of downsampling is to ultimately reduce the number texel fetches that are required to determine whether a bounding box is visible or not.

Test Object Bounds

The final step is to test all the received bounding boxes from the meshes against the downsampled Hi-Z map. This all takes place within a compute shader, where it computes the screen space bounds of the bounding box, then it uses the size of the bounds to find the MIP level to sample from.

Larger objects will sample from higher levels in the MIP chain, whereas smaller objects will sample lower levels. This is because, for example, larger objects require a coarse view of the world to fit in the 2×2 texel fetches that encompass the screen-space bounds.

After the MIP level has been calculated, all that remains is to fetch the 2×2 texels from the Hi-Z map that overlap the screen-space bounds, then get the maximum depth from the fetches. If the screen-space bound's minimum depth is larger than the maximum depth retrieved, then the bounding box is considered not visible.

Additions

Optimization

To find which occluders to render in the level, they were given their own Octree to efficiently query and frustum cull from a camera's view when an occlusion pass is started.

Rendering occluders could be optimized easily by restricting the type of occluders to only being simple opaque objects, without any alpha clipping or similar effects. By constraining the material, occluders can be quickly batched into groups based purely on the mesh. These groups are then drawn with instanced rendering.

I additionally introduced culling for objects in the aforementioned compute shader that are smaller than some number of pixels, since they won't really be visible anyway.

Light Culling

Because the spotlight and pointlight already have a bounding box that is used for frustum culling, it was quite easy to extend it for occlusion culling. It works largely the same as mesh, by adding itself as a candidate, it will receive an ID it can use to check if it has been culled. Furthermore, for a light that renders shadows on nearby objects, it can send its ID to the draw calls for the shadows to prevent them from rendering if the light is culled.

Multi-Camera Support

Occlusion culling was extended to make it possible to easily be run from any camera, where it could be added to, for example, shadow casting lights. That is why it was added for the OcclCandID to include which pass a draw call belongs to without having to be concerned over the order of execution on the main thread. All that requires to add it for any camera is to call the Run function in the OcclusionAssembler, and subsequent draw calls will then be given a stable OcclCandID that they can use to check the results before rendering.

Honorable Mentions

I tested adding occlusion culling to the cascaded directional shadow pass, but sadly only saw modest gain, where it reduced in our game Shattered from ~5000 draw calls to only ~4000. Since we don't even use a directional light in the game, and I'm planning on introducing batching for the shadow pass, I decided to remove it for that and fine-tune it in the future.

A feature that I attempted to add was the ability to read back the results on the Update thread so that objects can be marked as not visible, e.g., stop updating skeletal mesh. However, since the engine is double buffered, it would mean the results being read would be from the previous frame. This could be fine since the algorithm is fairly conservative, however, I deemed this to be unreliable and focused instead on other features.

Further Work

An area I want to explore further would be to improve culling for shadow casting lights from the view of the main camera. The idea behind it is outlined here in this article also by Nick Darnell https://www.nickdarnell.com/hierarchical-z-buffer-occlusion-culling-shadows/.

It works by first creating a Hi-Z map from the player's view, then a Hi-Z map from the light's view. Shadow casting objects that are occluded are culled from the light's view, where then a new bounding volume can be computed from the remaining cast shadow. If this volume can't be seen from the player's view, then the light is culled.

Another area that could be explored would be to automatically generate an occluder for a mesh. This way, one can skip having to deal with creating custom meshes for complex objects, which can quickly become a chore.

Reflections

To sum it up, I am satisfied with the results, and believe I have reached my goal of culling objects that are behind other objects.

In our game Shattered, the performance boost is not massive, but is noticeable sometimes if you toggle it on/off. In the best case scenario, it could be upwards of ~10% increase by reducing the number of drawn meshes from a total of ~3500 to just ~800. Seeing that there's no pop-in as well, I decided to add it to the final build of the game.

Performance gain from Occlusion Culling depends very much on the type of game, where it can differ drastically on where the player is looking. In the worst case scenario, occlusion culling will do nothing since the player can see everything, and in the best case the player is looking straight into a wall, culling everything behind it. Its efficiency varies a lot, however, I believe you should definitely consider enabling it even if it maybe does not seem suitable, since the culling process itself is made to be incredibly light-weight. The culling process from the main camera in Shattered seeing the entire level roughly only takes 0.9% of the frame, taking between 50-200 µs.