Metal 2 on A11 -The Shading

WWDC 2018

Posted by Den on October 02, 2018 · 5 mins read
Metal 2 on A11 -The Shading

Metal 2 on A11 -The Shading

Tech Talks

Metal 2 on A11 -The Shading

Tech Talks

Tile Memory

Mixing Render and Compute

Without Cache
Using Cache

Interleaving Draws and Dispatches

  • Dispatches are interleaved with draws
  • Executed in API submission order
  • Dispatches barrier against earlier and later draws

Thread Organization

  • Compute pass
    - Threadgroups organized as tightly packed grids
    - Threadgroup size changes per dispatch
  • Render pass
    - Tile size fixed per render pass
    - Threadgroup size changes per dispatch

Dispatch Affects All Tiles

Render Pass Setup

  • Choose between 3 tile sizes
    - 32x32, 32x16, or 16x16
  • Constrained by tile memory size ( 32KB )
    - Per-sample image block size
    - Threadgroup memory size

Pipeline Setup

  • New descriptor type
    - 1 function can be bound
    - No blend state
  • Tile piplines can be built from
    - Kernel functions
    - Fragment functions

Imageblocks in Tile Pipelines

  • Kernel-based tile pipelines access:
    - All (x, y) locations
    - Explicit imageblock elements by reference
    - Implicit imageblock elements by value
  • Fragment-based title pipelines access:
    - Implied (x,y) location
    - Explicit imageblock elements by value
    - Implicit imageblock elements by value

Threadgroup Memory Persistence

  • Render pass imageblocks persist for lifetime of tile
  • Render pass threadgroup memory also persists for lifetime of tile
  • Threadgroup memory well suited for tile constant data

Repurposing Tile Memory

  • The shading enables merging compute and render passes
  • Use fragment-based tile pipelines to transition between memory layouts
    - The barriers ensure atomicity across pixels
    - Value semantics ensure atomicity within pixels

GPU Debugger Support

  • Inspect threadgroup memory