Metal for OpenGL Developers
Metal for OpenGL Developers
WWDC 2018
data:image/s3,"s3://crabby-images/d9b34/d9b34b31e62bdb8a76605e16859171035d0be172" alt=""
- These legacy APIs are deprecated
- Still available in iOS 12, macOS 10.14 and tvOS 12
- Begin transitioning to Metal
Choosing an Approach
- High-level Apple frameworks
- SpriteKit, SceneKit, and Core Image - 3rd-party engine
- Unity, Unreal, Lumberyard, etc
- Update to lastest version
Challenges with OpenGL
- OpenGL designed more than 25 years ago
- Core architecture reflects the origin of 3D graphics
- Extensions retrofitted some GPU feature - Fundamental design choices based on past principles
- GPU pipeline has changed
- Multithreaded operation not considered
- Asynchronous processing, not core
Design Goals for Metal
- Efficient GPU interaction
- Low CPU overhead
- Multithreaded execution
- Predictable operation
- Resource and synchronization control - Approachable to OpenGL developers
- Built for modern and Apple-design GPUs
Key Conceptual Differences
- Expensive operations less frequent
- Expensive CPU operations performed less often
- More GPU command generation during object creation
- Less needed when rendering - Modern GPU pipeline
- Reflects the modern GPU architectures
- Closer match yields less costly translation to GPU commands
- State grouped more efficiently - Multithreaded execution
- Designed for multithreaded execution
- Clear rules for multithreaded usage
- Cross thread object usability - Execution model
- True interaction between software and GPU
- Predictable operation allows efficient designs
- Thinner stack between application and GPU
data:image/s3,"s3://crabby-images/82a57/82a57e3aa5e789b06ed3989096b090acf498f80b" alt=""
data:image/s3,"s3://crabby-images/9ce3e/9ce3edca2b7ea9bdc2cc8fa82dce5804f973268e" alt=""
data:image/s3,"s3://crabby-images/8ac67/8ac67f45c5df8c7fb567560f4b3bde4e58430b59" alt=""
Command Encoders
- Render Command Encoder
- Blit Command Encoder
- Compute Command Encoder
Render Command Encoder
- Commands for render pass
- Encodes a series of render commands
- Also called a Render Pass
- Set render object for the graphics pipeline ( Buffer, texture, shaders )
- Issue draw commands ( draw primitives, draw index primitives, instanced draws ) - Render targets
- Associated with a set of render targets ( Textures for rendering )
- Specify a set of render targets upon creation
- All draw commands directed to these for lifetime of encoder
- New render targets need a new encoder
- Clear delineation between sets of render targets
Render Object
- Textures
- Buffers
- Samplers
- Render pipeline states
- Depth stencil states
Render Object creation
- Create from a device
- Usable only on the device - Object state set at creation
- Descriptor object specifies properties for render object - State set at creation fixed for the lifetime of the object
- Image data of textures and values in buffers can change - Metal compiles objects into GPU state once
- Never needs to check for changes and recompile - Multithreaded usage more efficient
- Metal does not need to protect state from changes on other threads
Metal Porting
data:image/s3,"s3://crabby-images/b3aad/b3aad1d6ca9ab90817bc47c513d17d84618c7824" alt=""
Metal Shading Language
- Based on C++
- Classes, templates, structs, enums, namespaces - Built-in types for vectors and matrices
- Built-in functions and operators
- Built-in classes for textures and samplers
data:image/s3,"s3://crabby-images/055af/055af84a7fb24f92f26fa3da2df48f9614560bc3" alt=""
SIMD Type Library
Types for shader development
- Vector and matrix types
- Usable with Metal shading language and application code
data:image/s3,"s3://crabby-images/a4706/a47064e4931e1e65a804c44037864d07544a9eb6" alt=""
Shader Compilation
Build with Xcode
- Xcode compiles shaders into a Metal library (.metallib)
- Front-end compilation to binary intermediate representation
- Avoids parsing time on customer systems
- By default, all shaders built into default.metallib
- Placed in app bundle for run time retrieval
Runtime Shader Compilation ⚠️
- Also can build shaders from source at runtime
- Significant disadvantages
- Full shader compilation occurs at runtime
- Compilation errors less obvious
- No header sharing between application and runtime built shaders - Build time compilation recommended
Devices
- A device represents one GPU
- Create render objects
- Texture, buffers, pipelines - macOS multiple devices may be available
- Default device suitable for most applications
data:image/s3,"s3://crabby-images/122b1/122b13d253e0c50e7c6774201af7a35149c8fee2" alt=""
data:image/s3,"s3://crabby-images/425b8/425b8d264afc48b2a2bdedda2022e7b32d58ec45" alt=""
Command Queues
- Queue created from a device
- Queue execute command buffers in order
- Create queue at initialization - Typically one queue sufficient
Texture
data:image/s3,"s3://crabby-images/5671a/5671a8442f57d75620de66fc3069155dac186e2a" alt=""
Storage Modes
data:image/s3,"s3://crabby-images/37440/37440cd961facdf7dbd35f213ebf96d5c51ad66e" alt=""
data:image/s3,"s3://crabby-images/89cae/89cae27c523fe9fd715c6997d19cd3f618c6cd5f" alt=""
data:image/s3,"s3://crabby-images/9d3c4/9d3c46f84ff167e682da7caf656dad2e8ca9b686" alt=""
data:image/s3,"s3://crabby-images/58197/58197751c1665aa65f3782fb98d764468b9e9fa0" alt=""
data:image/s3,"s3://crabby-images/268a2/268a2d77fee3ca2e780ca493f71e1cf0152abc56" alt=""
Texture Differences ⚠️
- Sampler state never part of texture
- Wrap modes, filtering, min/max LOD - Texture image data not flipped
- OpenGL uses bottom-left origin, Metal uses top-left origin - Metal does not perform format conversion
Buffers
- Metal uses buffers for vertices, indices, and all uniform data
- OpenGL’s vertex, element, and uniform buffer are similar
- Easier to port apps that have adopted these
data:image/s3,"s3://crabby-images/89172/891720e2029b158d4770c501c63a62b36435420e" alt=""
data:image/s3,"s3://crabby-images/27861/278616eb83100f32b476a995ceab025894482137" alt=""
Notes About Buffer Data ⚠️
!!! Pay attention to alignment !!!
data:image/s3,"s3://crabby-images/4fd15/4fd1546e5ae3b1902edd7d796a476382cb008f78" alt=""
- SIMD libraries vector and matrix types follow same rules as Metal shaders
- Special packed vector types available to shaders
-packed_float3
consumes 12 bytes
-packed_half3
consumes 6 bytes - Cannot directly operate on packed types
- Cast to non-packed type required
Storage Modes for Porting ⚠️
- Use most convient storage modes
- Easier access to data - On iOS
- Create all textures and buffers withMTLStorageModeShared
- On macOS
- Create all textures withMTLStorageModeManged
- Make judicious use ofMTLStorageModeShared
for buffers
Separate GPU only data from CPU accessible data
MetalKit
Texture and buffer utilities
- Texture Loading
- Textures from KTX, PVR, JPG, PNG, TIFF, etc - Model Loading
- Vertex buffers from USD, OBJ, Alembic, etc
PipeLines
data:image/s3,"s3://crabby-images/4f13e/4f13ef0cf6412c39f2b0f43ab35fc975460dbdcb" alt=""
data:image/s3,"s3://crabby-images/b5bd8/b5bd88f9517b53c992a231deca3bbe310e83428a" alt=""
Pipeline Differences
data:image/s3,"s3://crabby-images/7cf96/7cf96d12cd951d224744cc114fb27f35ca03522a" alt=""
Pipeline Building
- Create at initialization
- Full compilation key advantage of state grouping
- Choose a canonical vertex layout for meshes
- Use a limited set of render target formats - Lazy creation at draw time ⚠️
- Store pipeline state objects in a dictionary using descriptor as key
- Construct descriptor at draw time with current state
- Retrieve existing pipeline from dictionary OR build new pipeline
Create Render Objects at Initialization
- Object creation expensive
- Pipelines require backend compilation
- Buffers and textures need allocations - Once created, much faster usage during rendering
Command Buffers
- Explicit control over command buffer submission
- Start with one command buffer per frame
- Optionally split a frame into multiple command buffers to
- Submit early and get the GPU started
- Build commands on multiple threads - Completion handler invoked when execution is finished
data:image/s3,"s3://crabby-images/7a1b7/7a1b70a0f1b68ba9b824b38e3b71cc005ac380d8" alt=""
data:image/s3,"s3://crabby-images/3fb6a/3fb6ae9479673d422a17f210a7fa832c6bb76fb3" alt=""
data:image/s3,"s3://crabby-images/45a2f/45a2f4f1da6fc6f2889ef2049aeaceb50b9eb563" alt=""
Resource Updates
- Resources are explicitly managed in Metal
- No implicit synchronization like OpenGL
- Allows for fine gained synchronization
- Application has complete control
- Best model dependent on usage
- Triple buffering recommended
data:image/s3,"s3://crabby-images/86223/8622328aaa155dbfeb46d624b5d4ea3356df0a6b" alt=""
Problem
data:image/s3,"s3://crabby-images/98fa0/98fa0abb863d754652df96d9eb0b955a1dd1c29b" alt=""
Temporary Solution ⚠️
Synchronous wait after every frame
data:image/s3,"s3://crabby-images/175b0/175b0819bca0e0bbe22129bf4785c7621410aa5f" alt=""
Triple Buffering
Shared buffer pool 👍
data:image/s3,"s3://crabby-images/657c3/657c3e7c5136d5fc6ca002bfb7a8fa12a9ff1f7f" alt=""
data:image/s3,"s3://crabby-images/6d294/6d2946c87e51aa0a370ba927b036506e44ab25fa" alt=""
Render Encoders
data:image/s3,"s3://crabby-images/b8280/b8280eb46c5b37e4a819d8fa8291660f14c3aa46" alt=""
Render Pass Descriptor
data:image/s3,"s3://crabby-images/a412b/a412b93adceb759ac4a1ddb28ed35886d3c6b2fc" alt=""
Render Pass Descriptor
data:image/s3,"s3://crabby-images/a412b/a412b93adceb759ac4a1ddb28ed35886d3c6b2fc" alt=""
Render Pass Setup
data:image/s3,"s3://crabby-images/fd847/fd8479870e440d1207fd5e57cf34fc8c1c002f29" alt=""
ender Pass Load and Store Actions
data:image/s3,"s3://crabby-images/dbc9a/dbc9afad0c3f8cbf3b815e865cce3a9e8832d929" alt=""
data:image/s3,"s3://crabby-images/def9a/def9adfc9d1eac0e4c37202524c9a4eca46bc8b4" alt=""
data:image/s3,"s3://crabby-images/4d7e4/4d7e41981074c57d11d75964ac60c744b675de17" alt=""
Rendering with OpenGL
data:image/s3,"s3://crabby-images/1a7ef/1a7efd418241e0d966e59330702b2a9bec791244" alt=""
Rendering with Metal
data:image/s3,"s3://crabby-images/3c673/3c673ed64264ff3995d10e4ea7981f7bb7992d03" alt=""
Display
Drawables and presentation
- Drawables — Textures for on screen display
- Each frame MTKView provides
- Drawable texture
- Render pass descriptor setup with the drawable - Render to drawables like any other texture
- Present drawable when down rendering
data:image/s3,"s3://crabby-images/f9264/f92648e890371b10a84f1dd9b60327858016a736" alt=""
data:image/s3,"s3://crabby-images/de6de/de6deb1cd49bfe070a2fab073395cf6cc9697c66" alt=""
Incrementally Porting ⚠️
- Create shared Metal/OpenGL textures using IOSurface or CVPixelBuffer
- Render to texture on 1 API and read in the other - Can enabled mixed Metal/OpenGL applications
- Sample Link
Multithreading
Metal is designed to facilitate Multithreading
- Consider multithreading if application is CPU bound
- Encode multiple command buffers simultaneously
- Split single render pass using MTLParallelCommandEncoder
Staying on the GPU
- Metal natively supports compute
- Performance benefits
- Reduces CPU utilization
- Reduces GPU-CPU synchronization points
- Free’s data bandwidth to the GPU - New algorithms possible
- Particle systems, physics, object culling
Developer Tools
Debug and optimize your applications
- Xcode contains an advances set of GPU tools
- Enable Metal’s API vaildation layer
- On by default when target run from Xcode
data:image/s3,"s3://crabby-images/b25ae/b25aef5de75f24a2ccaca757e06b67e590b65447" alt=""
data:image/s3,"s3://crabby-images/a284c/a284c94e60cbf12fa8aea0bbe84041b6d7916ecd" alt=""
data:image/s3,"s3://crabby-images/e5f44/e5f441da1cffd5d928cd4135c69249e24d3c0291" alt=""
data:image/s3,"s3://crabby-images/d0980/d0980c012ca7aa72cf17b8d9335837d064321575" alt=""
data:image/s3,"s3://crabby-images/0505e/0505e1e266a77ca2c9bfafebba139c6e425d8d31" alt=""