Vision with Core ML

WWDC 2018

Create an app helping shoppers identify item

Take pictures
Sort into folders — the folder names are used as labels
How much data do I need
- Minimum of 10 per category but more is better
- Highly imbalanced datasets are har to train
Augmentation adds some robustness on top of it but doesn’t replace variety

Vision Frameworks FeaturePrint for Image Classfication

Don’t run expensive tasks when not needed
AM I holding still?
Using registration
- Cheap and fast
- Camera holds still
- Subject is not moving
VNTranslationalImageRegistrationRequest

Using Registration for Scene Stability
Use the VNSequenceRequestHandler with VNTranslationalImageRegistrationRequest
Compare against previous frame
sequenceRequestHandler.perform([request], on: previousBuffer!)
Registration is returned as pixels in the alignmentObservation.alignmentTransform
Analyze only when scene is stable
Create an VNImageRequestHandler for the current frame and pass in the orientation
Perform Barcode and Image Classification together
try imageRequestHandler.perform([barcodeDetection, imageClassification])
Manage your buffers
Some Vision requests can take longer
Perform longer task asynchronously
Do not queue up more buffers than the camera can provide
- We only operate with a one deep queue in this example

Using the storyboard

YOLO (You Only Look Once)
Fast Object Detection and Classification
- Label and Bounding Box
- Finds multiple and different objects
Train for custom objects
- Training is more involved than ImageClassifier

! Tracking is faster and smoother than re-detection !

Not all algorithms are orientation agnostic
Images are not always upright
- EXIF orientation defines what is upright
- When using a URL as input Vision reads the EXIF orientation from file
Live from a capture feed
- Orientation has to be inferred from UIDevice.current.orientation
- Needs to be mapped to a CGImagePropertyOrientation

Does 1.0 mean it’s certainly correct ????
- It fulfilled the criteria of the algorithm but our perception can differ
Where the threshold is depends on the use case
- Labeling requires high confidence — Observe how your classifier behaves
- Search might want to include lower confidence scores as they are probable

← Previous Post Next Post →