Metal for Accelerating Machine Learning

WWDC 2018

Posted by Den on August 28, 2018 · 12 mins read
Metal for Accelerating Machine Learning

Metal for Accelerating Machine Learning

WWDC 2018

Metal for Accelerating Machine Learning

WWDC 2018

Metal Performance Shaders

GPU-accelerated primitives, optimized for iOS and macOS

  • Image processing
  • Linear algebra
  • Machine learining
    - inference
    - training (new)
  • Ray tracing (new)


Image classification example


Image classification example

CNN Inference Enhancements

FP16 accumulation

  • Available with Apple A11 Bionic GPU for
    - Convolution
    - Convolution transpose
  • Sufficient precision for commonly used neural networks
  • Delivers better performance than FP32

CNN Training


Iterative process

Forward Pass

Loss computation

Gradient pass

Weight update


  • Forward pass → Loss computation → Gradient pass → Weight update

Training a Neural Network with MPS

  • Create training graph
  • Prepare inputs
  • Specify wights
  • Execute graph (Graph updates wights)
  • Complete training process

Create Training Graph

  • Describe neural network using graph API

  • Image nodes — Data

  • Filter nodes — Operations

Create an Inference Graph

How do we connect these nodes into a graph?

Prepare Inputs

  • Inputs to the graph
    - Batch of source images
    - Batch of source states


  • Batches are arrays of images or states


  • MPSState passes state of forward node to gradient node
  • Graph manages all states

Loss Labels

Data Source Providers

  • Convolution
  • Fully Connected
  • Batch normalization
  • Instance normalization
  • Just-in-time loading and purging of weights data
  • Minimize memory footprint

Execute graph

Updating Weights

  • Implement optional update method on Data Source Provider
  • Graph calls update method automatically


  • Describe how to take update step on training parameters
  • Used in update method of Data Source Provider
  • Variants
    - MPSNNOptimizerAdam
  • Custom

Complete training process



1 to 1


  • 1 to Many
  • Many to Many

Recurrent Neural Networks

Variants for inference and training (new)

  • Single Gate
  • Long Short-Term Memory (LSTM)
  • Gated Recurrent Unit (GRU)
  • Minimally Gated Unit (MGU)

Activity Classifier



Data Converters

Image → Matrix
Matrix → Image


Object classification training using TensorFlow with MPS