Metal for OpenGL Developers

WWDC 2018

Posted by Den on September 05, 2018 · 21 mins read
Metal for OpenGL Developers

Metal for OpenGL Developers

WWDC 2018

Metal for OpenGL Developers

WWDC 2018

  • These legacy APIs are deprecated
  • Still available in iOS 12, macOS 10.14 and tvOS 12
  • Begin transitioning to Metal

Choosing an Approach

  • High-level Apple frameworks
    - SpriteKit, SceneKit, and Core Image
  • 3rd-party engine
    - Unity, Unreal, Lumberyard, etc
    - Update to lastest version

Challenges with OpenGL

  • OpenGL designed more than 25 years ago
    - Core architecture reflects the origin of 3D graphics
    - Extensions retrofitted some GPU feature
  • Fundamental design choices based on past principles
    - GPU pipeline has changed
    - Multithreaded operation not considered
    - Asynchronous processing, not core

Design Goals for Metal

  • Efficient GPU interaction
    - Low CPU overhead
    - Multithreaded execution
    - Predictable operation
    - Resource and synchronization control
  • Approachable to OpenGL developers
  • Built for modern and Apple-design GPUs

Key Conceptual Differences

  • Expensive operations less frequent
    -
    Expensive CPU operations performed less often
    - More GPU command generation during object creation
    - Less needed when rendering
  • Modern GPU pipeline
    - Reflects the modern GPU architectures
    - Closer match yields less costly translation to GPU commands
    - State grouped more efficiently
  • Multithreaded execution
    - Designed for multithreaded execution
    - Clear rules for multithreaded usage
    - Cross thread object usability
  • Execution model
    - True interaction between software and GPU
    - Predictable operation allows efficient designs
    - Thinner stack between application and GPU
Open GL
Metal Single Buffer
Metal Muti-Buffer

Command Encoders

  • Render Command Encoder
  • Blit Command Encoder
  • Compute Command Encoder

Render Command Encoder

  • Commands for render pass
    - Encodes a series of render commands
    - Also called a Render Pass
    - Set render object for the graphics pipeline ( Buffer, texture, shaders )
    - Issue draw commands ( draw primitives, draw index primitives, instanced draws )
  • Render targets
    - Associated with a set of render targets ( Textures for rendering )
    - Specify a set of render targets upon creation
    - All draw commands directed to these for lifetime of encoder
    - New render targets need a new encoder
    - Clear delineation between sets of render targets

Render Object

  • Textures
  • Buffers
  • Samplers
  • Render pipeline states
  • Depth stencil states

Render Object creation

  • Create from a device
    - Usable only on the device
  • Object state set at creation
    - Descriptor object specifies properties for render object
  • State set at creation fixed for the lifetime of the object
    - Image data of textures and values in buffers can change
  • Metal compiles objects into GPU state once
    - Never needs to check for changes and recompile
  • Multithreaded usage more efficient
    - Metal does not need to protect state from changes on other threads

Metal Porting

Metal Shading Language

  • Based on C++
    - Classes, templates, structs, enums, namespaces
  • Built-in types for vectors and matrices
  • Built-in functions and operators
  • Built-in classes for textures and samplers

SIMD Type Library

Types for shader development

  • Vector and matrix types
  • Usable with Metal shading language and application code

Shader Compilation

Build with Xcode

  • Xcode compiles shaders into a Metal library (.metallib)
  • Front-end compilation to binary intermediate representation
  • Avoids parsing time on customer systems
  • By default, all shaders built into default.metallib
    - Placed in app bundle for run time retrieval

Runtime Shader Compilation ⚠️

  • Also can build shaders from source at runtime
  • Significant disadvantages
    - Full shader compilation occurs at runtime
    - Compilation errors less obvious
    - No header sharing between application and runtime built shaders
  • Build time compilation recommended

Devices

  • A device represents one GPU
  • Create render objects
    - Texture, buffers, pipelines
  • macOS multiple devices may be available
  • Default device suitable for most applications

Command Queues

  • Queue created from a device
  • Queue execute command buffers in order
    - Create queue at initialization
  • Typically one queue sufficient

Texture

Storage Modes

Shared storage
Private storage
Managed storage

Texture Differences ⚠️

  • Sampler state never part of texture
    - Wrap modes, filtering, min/max LOD
  • Texture image data not flipped
    - OpenGL uses bottom-left origin, Metal uses top-left origin
  • Metal does not perform format conversion

Buffers

  • Metal uses buffers for vertices, indices, and all uniform data
  • OpenGL’s vertex, element, and uniform buffer are similar
    - Easier to port apps that have adopted these

Notes About Buffer Data ⚠️

!!! Pay attention to alignment !!!

  • SIMD libraries vector and matrix types follow same rules as Metal shaders
  • Special packed vector types available to shaders
    - packed_float3 consumes 12 bytes
    - packed_half3 consumes 6 bytes
  • Cannot directly operate on packed types
    - Cast to non-packed type required

Storage Modes for Porting ⚠️

  • Use most convient storage modes
    - Easier access to data
  • On iOS
    - Create all textures and buffers with MTLStorageModeShared
  • On macOS
    - Create all textures with MTLStorageModeManged
    - Make judicious use of MTLStorageModeShared for buffers
     Separate GPU only data from CPU accessible data

MetalKit

Texture and buffer utilities

  • Texture Loading
    - Textures from KTX, PVR, JPG, PNG, TIFF, etc
  • Model Loading
    - Vertex buffers from USD, OBJ, Alembic, etc

PipeLines

Pipeline Differences

Pipeline Building

  • Create at initialization
    - Full compilation key advantage of state grouping
    - Choose a canonical vertex layout for meshes
    - Use a limited set of render target formats
  • Lazy creation at draw time ⚠️
    - Store pipeline state objects in a dictionary using descriptor as key
    - Construct descriptor at draw time with current state
    - Retrieve existing pipeline from dictionary OR build new pipeline

Create Render Objects at Initialization

  • Object creation expensive
    - Pipelines require backend compilation
    - Buffers and textures need allocations
  • Once created, much faster usage during rendering

Command Buffers

  • Explicit control over command buffer submission
  • Start with one command buffer per frame
  • Optionally split a frame into multiple command buffers to
    - Submit early and get the GPU started
    - Build commands on multiple threads
  • Completion handler invoked when execution is finished

Resource Updates

  • Resources are explicitly managed in Metal
  • No implicit synchronization like OpenGL
  • Allows for fine gained synchronization
  • Application has complete control
  • Best model dependent on usage
    - Triple buffering recommended

Problem

Without synchronization

Temporary Solution ⚠️

Synchronous wait after every frame


Triple Buffering

Shared buffer pool 👍

Render Encoders

Render Pass Descriptor

Render Pass Descriptor

Render Pass Setup

ender Pass Load and Store Actions

Rendering with OpenGL

Rendering with Metal

Display

Drawables and presentation

  • Drawables — Textures for on screen display
  • Each frame MTKView provides
    - Drawable texture
    - Render pass descriptor setup with the drawable
  • Render to drawables like any other texture
  • Present drawable when down rendering

Incrementally Porting ⚠️

  • Create shared Metal/OpenGL textures using IOSurface or CVPixelBuffer
    - Render to texture on 1 API and read in the other
  • Can enabled mixed Metal/OpenGL applications
  • Sample Link

Multithreading

Metal is designed to facilitate Multithreading

  • Consider multithreading if application is CPU bound
  • Encode multiple command buffers simultaneously
  • Split single render pass using MTLParallelCommandEncoder

Staying on the GPU

  • Metal natively supports compute
  • Performance benefits
    - Reduces CPU utilization
    - Reduces GPU-CPU synchronization points
    - Free’s data bandwidth to the GPU
  • New algorithms possible
    - Particle systems, physics, object culling

Developer Tools

Debug and optimize your applications

  • Xcode contains an advances set of GPU tools
  • Enable Metal’s API vaildation layer
    - On by default when target run from Xcode