Object Tracking in Vision
Object Tracking in Vision
WWDC 2018
Vision in a Nutshell
- One stop for solving computer vision problems
- Simple, consistent interface
- Runs on iOS, macOS, and tvOS
- Privacy-oriented
- Continuously evolving
Vision Basics
Requests
Request Handlers
Observations
- Family of classes derived from
VNObservation
- How to obtain a
VNObservation
?
- Returned inVNRequest results
property
- Can be manually created
New Face Detector
- Finds more faces
- Now orientation-agnostic
VNDetectFaceRectanglesRequest
(Sample API)VNDetectFaceRectanglesRequestRevision2
VNFaceObservation
has 2 new properties
Request Revisioning
- Vision Request now support revisioning
- Future-proof your app — Error for unavailable functionality
Image Request Handler
- Used to process one or more requests on the same image
- Optimizes performance by caching image derivatives and request results
Sequence Request Handler
- Processes requests on the sequence of images
- Used to process 2 types of requests — Tracking and Image Registration
VNRequest Initialization
Understanding Results
- Collection of
VNObservation
objects inVNRequest results
property
- The number of observations is from 0~n
VNObservation
is immutable- Important common observation properties:
-uuid
— is used to match related results
-confidence
— Shows quality of returned results
Request Pipelines
Lifecycle Management
How long to keep objects in memory?
- Image Request Handler (While the image needs processing)
- Sequence Request Handler (While the sequence needs processing)
- Requests/Observations (Lightweight objects, create/release as needed
Where to Process Your Requests?
- Many requests in Vision rely on Neural Networks
- Neural Networks usually run faster on GPUs
- Vision can run requests on both CPU and GPU
- Default: Use GPU, switch to CPU if GPU is busy
- Explicit: SetVNRequest usesCPUOnly
to true
Tracking in General
- Object of interest: Auto-detected or manually selected
- Sequence of frames: Camera feed
- Tracking: Look for the object of interest
- Applications: Focus tracking with camera
Why Tracking and Not Detection?
- No specific detectors for all objects
- Need to match detected objects
- Trackers use temporal information
- Speed — Trackers are faster
- Trackers are smoother, not as jittery
Tracking Types in Vision
Demo
Tracking in Vision
- Initial object of interest selection
- Automatic: By running an appropriate detector
- Manual: User input - One tracking request per tracked object (1:1
- 2 Types:
VNTrackObjectRequest
,VNTrackRectangleRequest
- Tracking algorithm:
trackingLevel = .fast / .accurate
- Tracking quality: Use observation
confidence
property - How many objects can we track simultaneously?
- Limit: 16 trackers of each type at a time
- Error is returned if over limit - How to release a tracker?
- Request’s propertylastFrame = true
- ReleaseVNSequenceRequestHandler
Tracking Challenges
- Fast or accurate ?
- Initial bounding box location, use salient object
- Which confidence level threshold to use?
- Consider rerunning detectors every N frames