While many reviews are now pointing to the new Huawei P30 Pro camera as being the king of the hill, the Pixel 3 was king just months ago (01, 02, 03 & 04) and it did by using a single lens camera versus their competitors using a dual lens camera, like the iPhone XS Max. The magic behind Pixel 3's camera is Machine Learning. Earlier this month a Google Patent application was published by the European Patent Office that dove into the technology behind Google's camera.
Google notes that most competitors use a stereo camera (like the iPhone Xs Max) that is made up of two or more image capture components (cameras) to simultaneously capture multiple images that can be combined in some fashion to create or simulate a 3D stereoscopic image.
Although the stereo camera can determine 3D information about the scene, the use of multiple image capture components (cameras) increases the overall cost and complexity involved with producing the stereo camera.
Google's patent application, discovered in Europe recently by Patently Mobile, covers the Pixel smartphone camera that uses a depth estimation technique that can be used to estimate the depth of elements in a scene captured as an image by a single camera.
Google's patent Figure 5 below depicts a simplified representation of an image capture component (camera) capturing an image of an object; Figure 6 depicts determining the distance between an object and a camera; Figure I0 is a flow chart that shows that the Pixel smartphone uses machine learning to perform the task of creating depth instead of using dual cameras.
It's in Block 1002 of Google's patent FIG. 10 that brings machine learning into focus. Google notes that block 1002 may involve performing machine learning-based foreground-background segmentation.
Particularly, foreground-background segmentation can be used to determine which pixels of the image belong to the foreground (e.g., a person that is subject of focus) and which pixels belong to the background.
In some examples, a neural network is used to perform foreground background segmentation on a captured image. The neural network may analyze the image in order to estimate which pixels represent a primary focus in the foreground and which pixels do not. In further instances, the neural network can be trained to detect pixels that correspond to an object positioned in the foreground of the image.
Convolutional Neural Network (CNN)
In some implementations, the neural network can be a convolutional neural network (CNN) with skip connections. The term "convolutional" can represent that the learned components of the network are in the form of filters (a weighted sum of the neighbor pixels around each pixel). As such, the CNN may filter the image and then further filter the filtered image. This filtering process using the CNN may be repeated in an iterative manner.
In addition, the skip connections associated with the CNN may allow information to easily flow from the early stages in the CNN process where the CNN reasons about low-level features (e.g., colors and edges) up to later stages of the CNN where the CNN reasons about high-level features (e.g., faces and body parts). Combining stages of processing using the CNN can identify and focus upon the desired subject of the image (e.g., focus upon the person), including identifying which pixels correspond to the desired object (e.g., which pixels correspond to the person).
Block 1004 may also involve estimating a depth map using the image. Particularly, to enhance a final rendered image that blurs portions of the original image, depth at each point in the scene may be determined and used. In some examples, computing depth may involve using a stereo process. As discussed above, the camera may include pixels that can be divided into subpixels.
Google's patent application is titled "Estimating Depth using a Single Camera." To dig further into Google's invention, you could review Google's European patent application under number WO2019070299. It was originally filed in in November 2017 and published by the European Patent Office on April 11, 2019.
Note to readers: caution, don't confuse the use of Convolutional Neural Network (CNN) with Fake News CNN. 😎
A few of the Key Inventors listed on Google's Patent
Yael Pritch Knaan: worked in Google X and in Google AI/Perception developing computational photography / machine learning technologies for the Google Pixel cameras and other Google products. She has more than 50 published papers and patents in her field.
Yael Pritch Knaan received her PhD degree in Computer Science from the Hebrew University of Jerusalem, and her Post doc in Disney Research Zurich. Her research is in the area of computational photography for video and still images.
Rahul Garg: Senior Research Scientist at Google; Computational Photography team in Google Research. Improving the image quality on Google Pixel camera and building magical features. He previously worked on Daydream (VR) team.
Marc Levoy: Machine Perception; At Stanford he taught computer graphics, digital photography, and the science of art. At Google he launched Street View, co-designed the library book scanner, and currently leads a team whose projects include HDR+ mode, Portrait mode, and Night Sight mode on Pixel cameras, and the Jump light field camera.
Comments