Google Goggles is an app that first appeared around 2012 and was ahead of its time, yet it was too unfocused to gain widespread user acceptance despite getting over 10 million downloads. Marketed as a generalized way to digitally identify and make sense of anything that you could point your camera at, Google Goggles left many users confused as to what exactly to do with it. Now, several years later, the app has been removed from the iOS app store, and the Android version hasn’t been updated in 2-1/2 years. But despite the fading relevance of the app itself, the technologies that made up the Google Goggles app are very much alive, having been chopped up and redistributed among Google’s other offerings.

General object identification in images: Cloud Vision API

The Google Cloud Vision API recognizes things like landmarks, artwork, and products. The Goggles app collected data to train the cloud vision API models in the same way that the 1-800-GOOG-411 telephone directory assistance service collected voice data to train Google’s speech recognition models. For example, data is gathered in part by Goggles’ “Search from camera” mode that vacuums up all the photos you take with your phone camera. Now the resulting object recognition capability is available for a fee from the Cloud Vision API, where Google continues to gather data and improve its models. Image-based search is also available through the Google search engine and a shortcut in the Chrome web browser.

On-device frame-based processing, text detection and OCR, and barcode scanning: Mobile Vision API

Google’s Mobile Vision API is a separate system from the Cloud Vision API that works offline – that is, it can run on a phone when there’s no network connection available because it functions as a part of Google Play Services. It provides optical character recognition that’s based on Tesseract and currently works for languages with Latin-based alphabets. The Mobile Vision API also provides the same optical flow-based object tracking that the Goggles app used. The Mobile Vision API is extensible and developer-friendly too. If a developer wants to implement a custom image processing system such as, say, overlaying graphics onto faces like the Snapchat dog face filter, they can do that with the Mobile Vision API.

Translation of text visible from the camera: Google Translate

The text translation features originally available in Goggles have been superseded by those now available in the Google Translate app. Google Translate provides a better interface than Goggles because the language is already selected by the user, eliminating the need for identifying the language of printed text in a given image, thereby removing a potential source of error. Further, Google Translate’s on-device image processing allows for fast OCR that enables a quick translation at the per-word (but not per-sentence) level.

Exploring your world with a camera: Google Cardboard VR headset

The idea of parsing and augmenting what you see using image processing is incorporated into Google Cardboard. The design is such that it allows the wearer to get input from the camera as they wear the headset. We may start to see Street View and dashcam-gathered data integrated into this type of an augmented reality system.

New hardware-software integration: Pixel handset

Google Goggles uses an image blur detection algorithm to determine when the device camera is out of focus, triggering the camera autofocus cycle in response, and thereby setting up the camera input for optimal scanning. A similar integration of software and hardware is used in the accelerometer-based camera stabilization incorporated into the Pixel’s high-end camera, which provides a smooth and fast camera input even when the user has shaky hands.

Future capabilities

When the Goggles app was split apart, its on-device capabilities ended up in the Mobile Vision API, and its cloud-based capabilities ended up in the Cloud Vision API. Given the current trends, the Google Mobile Vision API is best positioned to reveal important capabilities over the coming years. As device CPU speeds increase and multicore handsets proliferate, more and more powerful image processing will be able to run on the device itself, without incurring the slowdown required to transmit images to a cloud-based API. Video input, as an alternative to one-frame-at-a-time image processing, will become more achievable. Developers will have the flexibility to create a variety of apps around Google’s models through their APIs. Users will be able to make sense of image-based data with more speed and clarity. We’ll see more apps doing something like serving as a generalized scanner for all camera input, recognizing objects of all types. The incremental improvements spearheaded by Google will continue to power new apps as the company turns its old experiments into new APIs and products.