Vision with Core ML
Vision with Core ML
WWDC 2018
The Storyline
Create an app helping shoppers identify item
- Train a custom classifier
- Build an iOS app
- Keep an eye on pitfalls
data:image/s3,"s3://crabby-images/74570/74570b7f83ad57b902a87aae19106c19fc3004e5" alt=""
Our Training Regimen
- Take pictures
- Sort into folders — the folder names are used as labels
- How much data do I need
- Minimum of 10 per category but more is better
- Highly imbalanced datasets are har to train - Augmentation adds some robustness on top of it but doesn’t replace variety
data:image/s3,"s3://crabby-images/a9406/a9406dc88206eff41a58dda93c453ee086a464ec" alt=""
Under the Hood
data:image/s3,"s3://crabby-images/73b9f/73b9f644ac5d51215f21ed8adce81975614a139d" alt=""
Vision FeaturePrint.Scene
Vision Frameworks FeaturePrint for Image Classfication
- Available through ImageClassifier training in Create ML
- Trained on a very large dataset
- Capable of predicting over 1000 categories
- Powers user facing features in Photos
- Continuous improvement (You might want to retrain in the future)
- Already on device
- Smaller disk footprint for your custom model - Optimized for Apple devices
data:image/s3,"s3://crabby-images/e33ac/e33ace5b62d176d25a159317c9b5d392a211199a" alt=""
data:image/s3,"s3://crabby-images/b70bb/b70bb40977ed247495ffa9e85cc0268bef87252d" alt=""
Refining the App
Only classify when needed !
- Don’t run expensive tasks when not needed
- AM I holding still?
- Using registration
- Cheap and fast
- Camera holds still
- Subject is not moving VNTranslationalImageRegistrationRequest
data:image/s3,"s3://crabby-images/c8d37/c8d3701e3b6dfdc55659e840c57e0a5679c2bf67" alt=""
Always have a backup plan
- Classifications can be wrong
- Event when confidence is high > plan for it
- Alternative identification
- Barcode reading
data:image/s3,"s3://crabby-images/e4d25/e4d25f0892909bcc8b060edfa7861e7b2a0121df" alt=""
Demo
- Using Registration for Scene Stability
- Use the
VNSequenceRequestHandler
withVNTranslationalImageRegistrationRequest
- Compare against previous frame
sequenceRequestHandler.perform([request], on: previousBuffer!)
- Registration is returned as pixels in the
alignmentObservation.alignmentTransform
- Analyze only when scene is stable
- Create an
VNImageRequestHandler
for the current frame and pass in the orientation - Perform Barcode and Image Classification together
try imageRequestHandler.perform([barcodeDetection, imageClassification])
- Manage your buffers
- Some Vision requests can take longer
- Perform longer task asynchronously
- Do not queue up more buffers than the camera can provide
- We only operate with a one deep queue in this example
1. Take photos
data:image/s3,"s3://crabby-images/707c3/707c3c928b938f4e593f317ff7e4f11fa8c3e38b" alt=""
2. Make a Machine Leaning Model
Using the storyboard
data:image/s3,"s3://crabby-images/d2956/d29566cbd826324054510f34966dbbbcadbca61b" alt=""
data:image/s3,"s3://crabby-images/e6920/e6920633e6db00caaae550005bd0632fd8b0eba3" alt=""
3. Settings
data:image/s3,"s3://crabby-images/1691d/1691da539c60d697ac498bc1015ace74826a95d3" alt=""
4. Run
data:image/s3,"s3://crabby-images/fbfdd/fbfddbd97871c94a8b3b1df5a679a8e88550f281" alt=""
Why not use just Core ML?
- Vision does all the scaling and color conversion for you
data:image/s3,"s3://crabby-images/fb237/fb237fbfbfdbdd31ec350360b38481c52731b72a" alt=""
Object Recognition
- YOLO (You Only Look Once)
- Fast Object Detection and Classification
- Label and Bounding Box
- Finds multiple and different objects - Train for custom objects
- Training is more involved than ImageClassifier
Demo
data:image/s3,"s3://crabby-images/472aa/472aa202098d435ba5a2c0968132f10f7b2f3a03" alt=""
VNRecognizedObjectObservation
- Result of a
VNCoreMLModelRequest
- New observation subclass
VNRecognizedObjectObservation
- YOLO based models made easy
data:image/s3,"s3://crabby-images/f50aa/f50aa994a260fcc88c8024b21447b0b5d47e35ff" alt=""
Tracking
! Tracking is faster and smoother than re-detection !
- Use tracking to follow a detected object
- Tracking is a lighter algorithm
- Applies temporal smoothing
Image Orientation
- Not all algorithms are orientation agnostic
- Images are not always upright
- EXIF orientation defines what is upright
- When using a URL as input Vision reads the EXIF orientation from file - Live from a capture feed
- Orientation has to be inferred fromUIDevice.current.orientation
- Needs to be mapped to aCGImagePropertyOrientation
Vision Coordinate System
data:image/s3,"s3://crabby-images/21cca/21ccadc1181f6dd9416b414ff9796660ca4f4211" alt=""
Confidence Score
- A log of algorithm can express how certain they are about the results
- Confidence is expressed 0 ~ 1.0
- The scale is not uniform across request types
data:image/s3,"s3://crabby-images/93bf9/93bf9965e6ebd4738dba40975a2a8ddd20ff7ed6" alt=""
Confidence Score Conclusion
- Does 1.0 mean it’s certainly correct ????
- It fulfilled the criteria of the algorithm but our perception can differ - Where the threshold is depends on the use case
- Labeling requires high confidence — Observe how your classifier behaves
- Search might want to include lower confidence scores as they are probable