Metal 2 on A11 — Overview

WWDC 2018

Posted by Den on October 01, 2018 · 5 mins read
Metal 2 on A11 — Overview

Metal 2 on A11 — Overview

Tech Talks

Metal 2 on A11 — Overview

Tech Talks

Metal 2

  • GPU-driven rendering
  • Platform feature alignment
  • Machine learning acceleration
  • Advanced optimization tools

Classical GPU Architecture

TBDR GPU Architecture

A11 GPU Architecture

Accelerated Rendering Techniques

  • Deferred Rendering
  • Tiled Forward
  • Order Independent Transparency
  • Multi-Layer Alpha Blending
  • Sub-Surface Scatter
  • MSAA Tone-mapping
  • Custom Resolves
  • Surface Aggregation

Metal 2 on A11

Advancing the TBDR Architecture

  • Imageblocks
  • Tile Shaders
  • Imageblock Sample Coverage Control
  • Raster Order Groups
  • Threadgroup Sharing

Imageblocks

  • 2D data structure accessible from shaders
    - Single pixel access from fragment functions
    - Full access from kernel functions
  • Multi-plane layout
    - Efficient bulk store pixels to textures
  • Supports optional format conversion

Tile Shading

  • Compute within a render pass
  • Access to entire image block
  • Access to threadgroup memory

Enhanced MSAA

  • A11 GPU tracks unique sample data for even faster blending
  • Imageblock Sample Coverage Control
    - Access sample coverage tracking data
    - Resolve at any time in your render pass
    - Implement custom resolve in a tile shader

Raster Order Groups

  • Access memory from overlapping fragment functions in submission order
  • Allows fragment functions to communicate
  • A11 GPU addtions:
    - Support for Tile Shaders and Threadgroup Imageblock
    - Support for multiple Raster Order Groups

Threadgroup Sharing

  • Flexible and efficient sharing of data between threads
    - Threadgroups can communicate with each other
    - Threads within a threadgroup can also communicate without a barrier
    - Use atomic operations or a memory fence

Additional Metal 2 Features

  • More accurate f16 math
  • Texture Cube Arrays
  • Read / Write Texture
  • Array of Samplers
  • Post Depth Coverage
  • Flexible Compute Dispatch
  • Quad Scoped Permute Operations

Improved Compute Performance

  • Up to 2x math performance on
    - Computer Vision
    - Image processing
    - Machine learning

Improved Performance and Capabilities