Object Tracking in Vision
Object Tracking in Vision
WWDC 2018
Vision in a Nutshell
- One stop for solving computer vision problems
- Simple, consistent interface
- Runs on iOS, macOS, and tvOS
- Privacy-oriented
- Continuously evolving
Vision Basics
data:image/s3,"s3://crabby-images/128de/128de92783532d175017a2690756f896d273190e" alt=""
Requests
data:image/s3,"s3://crabby-images/7a5c4/7a5c439fe14e386f8d17e2eb9c9298ad439d4dcf" alt=""
Request Handlers
data:image/s3,"s3://crabby-images/0a9bc/0a9bcea9a5bd6b5a7f63a51e3c16066b51d38d20" alt=""
Observations
- Family of classes derived from
VNObservation
- How to obtain a
VNObservation
?
- Returned inVNRequest results
property
- Can be manually created
data:image/s3,"s3://crabby-images/bc705/bc705e1e5583910cf46898ebb656bf4cee93765f" alt=""
New Face Detector
- Finds more faces
- Now orientation-agnostic
data:image/s3,"s3://crabby-images/16dd3/16dd3165f163db033f7fd0ba4e987d94eaaa2959" alt=""
VNDetectFaceRectanglesRequest
(Sample API)VNDetectFaceRectanglesRequestRevision2
VNFaceObservation
has 2 new properties
data:image/s3,"s3://crabby-images/1d223/1d22337480acc2bd640568a3309e5400aeed42d2" alt=""
Request Revisioning
- Vision Request now support revisioning
- Future-proof your app — Error for unavailable functionality
Image Request Handler
- Used to process one or more requests on the same image
- Optimizes performance by caching image derivatives and request results
data:image/s3,"s3://crabby-images/a79e1/a79e1b7588d4961a974e88392673ca7c56ff1e4a" alt=""
Sequence Request Handler
- Processes requests on the sequence of images
- Used to process 2 types of requests — Tracking and Image Registration
data:image/s3,"s3://crabby-images/fc953/fc953ba8c0d08fa36b6b8ee72c4672898e5d48b2" alt=""
VNRequest Initialization
data:image/s3,"s3://crabby-images/6524e/6524e589cf530fd397829c2d603352b8721aeec1" alt=""
data:image/s3,"s3://crabby-images/a2d64/a2d640f99d93fc19ff963587d954cc3f608f7f22" alt=""
Understanding Results
- Collection of
VNObservation
objects inVNRequest results
property
data:image/s3,"s3://crabby-images/fd68f/fd68f96a6d5c5ca67f9cbd56e9cb69d8e01563f8" alt=""
- The number of observations is from 0~n
VNObservation
is immutable- Important common observation properties:
-uuid
— is used to match related results
-confidence
— Shows quality of returned results
Request Pipelines
data:image/s3,"s3://crabby-images/c1cd3/c1cd311e0d48c6d61543e312dbfe92499bd03268" alt=""
data:image/s3,"s3://crabby-images/784ba/784bafa5a7fdb81998f48c82e44e11278a6723ae" alt=""
data:image/s3,"s3://crabby-images/dd065/dd06548d1a4d3371b3baada33fa7e7afd33f8643" alt=""
data:image/s3,"s3://crabby-images/fcd75/fcd7536a4ec50436428ee1090b6ae60832c8d5fe" alt=""
Lifecycle Management
How long to keep objects in memory?
- Image Request Handler (While the image needs processing)
- Sequence Request Handler (While the sequence needs processing)
- Requests/Observations (Lightweight objects, create/release as needed
Where to Process Your Requests?
- Many requests in Vision rely on Neural Networks
- Neural Networks usually run faster on GPUs
- Vision can run requests on both CPU and GPU
- Default: Use GPU, switch to CPU if GPU is busy
- Explicit: SetVNRequest usesCPUOnly
to true
Tracking in General
- Object of interest: Auto-detected or manually selected
- Sequence of frames: Camera feed
- Tracking: Look for the object of interest
- Applications: Focus tracking with camera
data:image/s3,"s3://crabby-images/b2de8/b2de8e4094c7ffadb5aa08977233dbff100ccae0" alt=""
Why Tracking and Not Detection?
- No specific detectors for all objects
- Need to match detected objects
- Trackers use temporal information
- Speed — Trackers are faster
- Trackers are smoother, not as jittery
Tracking Types in Vision
data:image/s3,"s3://crabby-images/55c85/55c85ad04f91d0f29d4f2f0e27d775685603ca1a" alt=""
Demo
data:image/s3,"s3://crabby-images/eff08/eff081c6f99d22134cc613c91cb961fa5cb03cd9" alt=""
data:image/s3,"s3://crabby-images/73ff6/73ff65dac98e619f375fc0946f88e21153c808c9" alt=""
Tracking in Vision
- Initial object of interest selection
- Automatic: By running an appropriate detector
- Manual: User input - One tracking request per tracked object (1:1
- 2 Types:
VNTrackObjectRequest
,VNTrackRectangleRequest
- Tracking algorithm:
trackingLevel = .fast / .accurate
- Tracking quality: Use observation
confidence
property - How many objects can we track simultaneously?
- Limit: 16 trackers of each type at a time
- Error is returned if over limit - How to release a tracker?
- Request’s propertylastFrame = true
- ReleaseVNSequenceRequestHandler
Tracking Challenges
- Fast or accurate ?
- Initial bounding box location, use salient object
- Which confidence level threshold to use?
- Consider rerunning detectors every N frames
data:image/s3,"s3://crabby-images/e6dff/e6dff20a55a1f4fb5eece0cd788e6be65a9fc379" alt=""