When Samsung introduced the AR Emoji feature on their new Galaxy S9, fair minded techies knew this was Samsung's attempt in copying Apple's iPhone X feature called Animoji. Samsung's feature is currently based on 2D imagery while Apple's iPhone X is based on their advanced TrueDepth camera system that is able to capture 3D facial recognition for Animoji and Face ID.
But a Samsung invention that began in 2013 and granted last week by the U.S. Patent and Trademark Office shows us that Samsung had been working on a future application that was designed to challenge Skype, Google's Hangout and Apple's FaceTime. The system is based on a future Galaxy smartphone with a 3D face camera. Communication will be made between advanced 3D avatars. So what Samsung introduced with AR Emoji is really just the appetizer before their video communications app is ready.
While Samsung believes users will find an advanced 3D avatar system superior to FaceTime, for instance, it won't be for everyone to be sure. The two FaceTime videos below can't be replaced with live avatars no matter how they'll try.
While for everyday use, teens may find it more fun to communicate as 3D avatars, I highly doubt that business users will want to be talking to cartoon characters. But who knows. It may start with avatars and eventually gravitate to real 3D faces as 5G networks are common.
For now, a short overview of Samsung's granted patent is presented below.
In a patent granted to Samsung last week by the U.S. Patent Office they note that existing video communication system and services, such as Skype and Google Hangout, transmit 2D video streams between devices running player applications.
Such video communication systems typically transmit video streams of compressed sequential images paired with audio streams between the devices. Most video communication systems for use by an individual user require a player application running on a computer device that includes a camera and a display. Examples of the computer device may include a desktop or laptop computer having a camera mounted at the top of the screen, or a mobile phone with the front facing camera built in to a bezel at the top.
While advantageously providing users with video capability, existing video communication systems have several drawbacks. For example, existing video communication systems typically require high bandwidth and are inherently high latency as entire image sequences need to be generated and compressed before transmitting the signal to the another device. In addition, for low latency and high quality applications, existing video communication systems require the communicating devices to transmit over Wi-Fi, 3G, or 4G mobile communication technologies.
Another problem with most video communication setups, whether on a desktop, laptop, or mobile phone, is that the user appears to be looking down to the person to which they are communicating via video because the user's gaze direction is at the display of the device, which is typically below where the camera is mounted.
This camera/display geometry disparity prevents users from having a conversation while looking each other in the eye. The related problem is that transmission of a video comprising 2D image sequences of a person also loses 3D depth information about their face.
There are also systems that may transmit a graphical representation of the user's alter ego or character, commonly referred to as an avatar, but avatars typically fail to convey the user's actual likeness, facial expressions, and body motion during the communication.
Accordingly, a need exists for a visual communication system capable of displaying the user's actual likeness, facial expressions, and motion in real time, while reducing bandwidth.
Samsung's granted invention relates to methods and systems for visual communication between a first device and a second device. Aspects of an exemplary embodiment includes creating a 3D mesh model of a first device user; receiving sensor data from a sensor array during the visual communication session between the first device and the second device, wherein the image data includes motion of the first device user; determining 3D mesh model updates using the image data; transmitting the 3D mesh model updates to the second device for the second device to update display of the 3D mesh model of the first device user, wherein the update is represented as one or more of a blend shape and a relative vertex position change of the 3D mesh model.
According to the method and system disclosed, sending 3D mesh model updates require significantly less bandwidth than sending image sequences, allowing for smooth communication in bandwidth constrained environments.
In addition, on the first device there is lower latency in interpreting changes to the 3D mesh model and sending updates than capturing new images and compressing the images into an image sequence.
On the second device, a single node of the 3D mesh model or blend shape can be updated at a time as opposed to having to wait for an entire image encode/compress/transmit/decompress cycle. And even if the second device does not support 3D video communication, the second device can still display the 3D mesh model of the first device user while communicating to the first device through transmission of conventional 2D video.
Samsung's patent FIG. 1 below is a block diagram illustrating an exemplary embodiment of a hybrid visual communication system; FIG. 3 is a block diagram illustrating representations of a 3D mesh model created of the user's face and head by the 3D model component; and FIG. 4 is a diagram illustrating a series of stored blend shapes representing facial expressions.
Samsung's patent states that the new visual communications application will be able to work with various operating systems such as Android, BlackBerry, Windows and iOS.
On another level, besides receiving image data from the sensor array of the hybrid visual communicator, it may also receive other sensor data relevant to a context of the visual communication session, including activity data of the first device user, and ambient conditions.
In one embodiment, the activity data of the first device user may be collected from activity sensors including one or more of an accelerometer, a gyroscope, a magnetometer, which may be used to determine movement of the first device and/or the first device user; and biometric sensors including heart rate sensor, a galvanic skin sensor, a pupil dilation sensor, an EKG sensor, any of which may be used to determine biometric data and perceived emotional state of the first device user.
The ambient condition data may be collected from ambient condition sensors including one or more of a thermometer, an altimeter, a light sensor, a humidity sensor, a microphone, and the like.
In another embodiment, the current selection of the visual mode may be selected manually by the user or selected automatically by the hybrid visual communicator. For example, the hybrid visual communicator may determine that the first device includes a 3D camera system and may then allow the user to choose 3D visual mode or 2D video mode (e.g., via a GUI or a menu). If the hybrid visual communicator discovers that the device only includes a 2D camera system, then the hybrid visual communicator may default to 2D video mode.
According to a further embodiment, the hybrid visual communicator may automatically suggest 2D video mode or 3D visual mode to the user based upon available bandwidth, and/or dynamically change the video mode based on changes to the bandwidth during the visual communication session.
If the current selection or default visual mode setting is 3D visual mode, the hybrid visual communicator may also poll its counterpart on the second device to determine whether or not the 3D mesh model is present on the other device, or the second device may perform a look up to see if the 3D mesh model is present based on an ID of the caller, and if not, request the 3D mesh model to be sent from the first device. If the second device indicates that the 3D mesh model is present on the second device, then the hybrid video communicator need not send the 3D mesh model to save bandwidth.
Storing the user's 3D Avatar Likeness
Further to Samsung's patent Figures 3 and 4. Samsung notes that FIG. 3 is a diagram illustrating example representations of a 3D mesh model created of the user's face and head by the 3D model component. In one embodiment, the 3D mesh model of the first device user may be stored in the 3D model database in a neutral position.
The 3D model component may also store different facial expressions, and optionally different body positions, as blend shapes representing the facial expressions, and optionally body positions, as linear combinations of blend shape coefficients.
In one embodiment, a color image of the user's face and/or one or more texture maps may also be associated with the 3D mesh model. The 3D model component may then use the resulting data to create a flexible, polygonal mesh representation of at least the person's face and head by fitting images to depth maps of the user's face and head.
Samsung's patent FIG. 4 is a diagram illustrating a series of stored blend shapes representing facial expressions. In one embodiment, the blend shapes may be stored in the emotional state database as a predetermined number (e.g., 48) of key poses. In one body, the blend shapes are stored in the emotional state databases.
Samsung's granted patent was issued to them on April 17, 2018 and originally filed in the U.S. in Q1 2016. Work on this project is noted as going back to mid-2013.
About Posting Comments: Patently Apple reserves the right to post, dismiss or edit any comments. Those using abusive language or negative behavior will result in being blacklisted on Disqus.
Comments