Metal 2 on A11 -The Shading

Metal 2 on A11 -The Shading

Tech Talks

Metal 2 on A11 -The Shading

Tech Talks

Tile Memory

Mixing Render and Compute

Without Cache

Using Cache

Interleaving Draws and Dispatches

Dispatches are interleaved with draws
Executed in API submission order
Dispatches barrier against earlier and later draws

Thread Organization

Compute pass
- Threadgroups organized as tightly packed grids
- Threadgroup size changes per dispatch
Render pass
- Tile size fixed per render pass
- Threadgroup size changes per dispatch

Dispatch Affects All Tiles

Render Pass Setup

Choose between 3 tile sizes
- 32x32, 32x16, or 16x16
Constrained by tile memory size ( 32KB )
- Per-sample image block size
- Threadgroup memory size

Pipeline Setup

New descriptor type
- 1 function can be bound
- No blend state
Tile piplines can be built from
- Kernel functions
- Fragment functions

Imageblocks in Tile Pipelines

Kernel-based tile pipelines access:
- All (x, y) locations
- Explicit imageblock elements by reference
- Implicit imageblock elements by value
Fragment-based title pipelines access:
- Implied (x,y) location
- Explicit imageblock elements by value
- Implicit imageblock elements by value

Threadgroup Memory Persistence

Render pass imageblocks persist for lifetime of tile
Render pass threadgroup memory also persists for lifetime of tile
Threadgroup memory well suited for tile constant data

Repurposing Tile Memory

The shading enables merging compute and render passes
Use fragment-based tile pipelines to transition between memory layouts
- The barriers ensure atomicity across pixels
- Value semantics ensure atomicity within pixels

GPU Debugger Support

Inspect threadgroup memory

← Previous Post Next Post →