Object Tracking in Vision

WWDC 2018

Posted by Den on August 20, 2018 · 10 mins read
Object Tracking in Vision

Object Tracking in Vision

WWDC 2018

Object Tracking in Vision

WWDC 2018

Vision in a Nutshell

  • One stop for solving computer vision problems
  • Simple, consistent interface
  • Runs on iOS, macOS, and tvOS
  • Privacy-oriented
  • Continuously evolving

Vision Basics

Requests

Request Handlers

Observations

  • Family of classes derived from VNObservation
  • How to obtain a VNObservation ?
    - Returned in VNRequest results property
    - Can be manually created

New Face Detector

  • Finds more faces
  • Now orientation-agnostic
  • VNDetectFaceRectanglesRequest (Sample API)
  • VNDetectFaceRectanglesRequestRevision2
  • VNFaceObservation has 2 new properties

Request Revisioning

  • Vision Request now support revisioning
  • Future-proof your app — Error for unavailable functionality

Image Request Handler

  • Used to process one or more requests on the same image
  • Optimizes performance by caching image derivatives and request results

Sequence Request Handler

  • Processes requests on the sequence of images
  • Used to process 2 types of requests — Tracking and Image Registration

VNRequest Initialization

Mandatory — Must be provided via initializer, overriding is OK
Optional — Initialized to default value, overriding is OK

Understanding Results

  • Collection of VNObservation objects in VNRequest results property
  • The number of observations is from 0~n
  • VNObservation is immutable
  • Important common observation properties:
    - uuid — is used to match related results
    - confidence — Shows quality of returned results

Request Pipelines

Pipeline — requests are executed to fullfill dependency

Lifecycle Management

How long to keep objects in memory?

  • Image Request Handler (While the image needs processing)
  • Sequence Request Handler (While the sequence needs processing)
  • Requests/Observations (Lightweight objects, create/release as needed

Where to Process Your Requests?

  • Many requests in Vision rely on Neural Networks
  • Neural Networks usually run faster on GPUs
  • Vision can run requests on both CPU and GPU
    - Default: Use GPU, switch to CPU if GPU is busy
    - Explicit: Set VNRequest usesCPUOnly to true

Tracking in General

  • Object of interest: Auto-detected or manually selected
  • Sequence of frames: Camera feed
  • Tracking: Look for the object of interest
  • Applications: Focus tracking with camera

Why Tracking and Not Detection?

  • No specific detectors for all objects
  • Need to match detected objects
  • Trackers use temporal information
  • Speed — Trackers are faster
  • Trackers are smoother, not as jittery

Tracking Types in Vision

Demo

Tracking in Vision

  • Initial object of interest selection
    - Automatic: By running an appropriate detector
    - Manual: User input
  • One tracking request per tracked object (1:1
  • 2 Types: VNTrackObjectRequest , VNTrackRectangleRequest
  • Tracking algorithm: trackingLevel = .fast / .accurate
  • Tracking quality: Use observation confidence property
  • How many objects can we track simultaneously?
    - Limit: 16 trackers of each type at a time
    - Error is returned if over limit
  • How to release a tracker?
    - Request’s property lastFrame = true 
    - Release VNSequenceRequestHandler

Tracking Challenges

  • Fast or accurate ?
  • Initial bounding box location, use salient object
  • Which confidence level threshold to use?
  • Consider rerunning detectors every N frames
Objects change their shape, size, appearance, color, …