Microsoft has revealed in a newly minted patent application that they're bringing real-time hand-gesturing to tablets, tabletops and beyond. A second aspect of this project relates to a pico-like projector system that is so secret that Microsoft openly confesses that they'll only talk about it as a concept and reveal the details about it at a later date. Whether this technology is being designed for Intel's 2013 Haswell processor based Ultrabooks is unknown at this time – but it sure would add a little excitement to their next generation hybrid notebook-tablets. And if they want to steal a little of Apple's magical thunder, then it's going to take these types of features to do it.
Microsoft's Patent Background
Microsoft makes it clear that object detection and recognition are difficult problems in the field of computer vision. Object detection involves determining the presence of one or more objects in an image of a scene. Image segmentation comprises identifying all image elements that are part of the same object in an image. Object recognition comprises assigning semantic labels to the detected objects. For example, to determine a class of objects that the object belongs to such as cell phones, pens, erasers, or staplers.
In a similar manner automatic recognition of hand poses in images is a difficult problem. Recognition of hand poses might be required for many different applications, such as interpretation of sign language, user interface control, and interpretation of hand poses and gestures in video conferencing.
There is a need to provide simple, accurate, fast and computationally inexpensive methods of object and hand pose recognition for many applications. For example, to enable a user to make use of his or her hands to drive an application either displayed on a tablet screen or projected onto a table top. There is also a need to be able to discriminate accurately between events when a user's hand or digit touches such a display from events when a user's hand or digit hovers just above that display.
Microsoft's Proposed Solution
Microsoft states that their patent application summary doesn't provide for an extensive overview of their invention – nor does it identify key/critical elements of the invention or delineate the scope of the invention. Microsoft actually goes on to state that "its sole purpose is to present some concepts disclosed herein in a simplified form as a prelude to the more detailed description that is presented later."
Microsoft concludes their patent application summary by stating that there's "a need to provide simple, accurate, fast and computationally inexpensive methods of object and hand pose recognition for many applications.
For example, to enable a user to make use of his or her hands to drive an application either displayed on a tablet screen or projected onto a table top. There is also a need to be able to discriminate accurately between events when a user's hand or digit touches such a display from events when a user's hand or digit hovers just above that display.
A random decision forest is trained to enable recognition of hand poses and objects and optionally also whether those hand poses are touching or not touching a display surface. The random decision forest uses image features such as appearance, shape and optionally stereo image features. In some cases, the training process is cost aware. The resulting recognition system is operable in real-time."
Microsoft's Camera System is designed to Classify Objects
In Microsoft's patent FIG. 1 shown below we see a high level schematic diagram of an image processing system for classifying images of items. An image processing system works in conjunction with an image capture device which could any of the following: a mono camera, a stereo camera, a video camera, a web camera, a Z-camera (also known as a time-of-flight camera) and a laser range finder.
The image processing system noted in FIG. 1 may be used for recognizing hand poses and/or objects. Images taken by the camera are input to the image processing system to be classified or labeled as being of one a plurality of specified classes. For example, these classes comprise a plurality of classes of hand pose, such as hand poses with fingers spread out, hand poses with fingers clenched into a fist, hand poses with a pointing index finger etc. The classes may also comprise a plurality of classes of object such as might be found in an office environment. For example, coffee mugs, staplers, rulers, erasers, mobile communications devices, scissors and the like. In addition, the classes may be divided into those in which an item is touching a display surface and those in which an item is not touching a display surface.
The image processing system comprises a learnt multi-class classifier and outputs labeled imaged. For example, if the image is of a particular hand pose, the labeled image may indicate which of a specified group of classes of hand pose this image belongs to.
Tablets Incorporating the Specialized Camera System Will be able to Control Application Functionality
In Microsoft's patent FIG. 2 above we see a schematic diagram of a tablet personal computer with a work surface (#202) and a camera (#202). The camera is arranged to capture images of a user's hands or other objects positioned between the camera and display (#201).
Microsoft states that their image processing system is incorporated into the tablet and could be set up to classify images captured by the camera. This classification information may then be used by a user interface in order to control the display and drive a particular application on the user's tablet.
For example, by recognizing hand poses and/or objects, a user interface will be able to control a particular application in a particular manner. Information about whether the hand-poses are touching a display surface is used by the system to control an application. For example, this "touch/no-touch" information may be obtained by the image processing system. Alternatively or in addition, it may be obtained using resistive touch overlays at the display surface or by any other touch-sensitive layer or other suitable means.
Hand Gesturing Controls will also Work with Displays Generated by a Projector
In another scenario, Microsoft states that it's also possible to use the image processing system of FIG. 1 to control a display projected onto a surface as illustrated in FIG. 3 below. A projector (#301) projects a display 300 onto a surface such as a table (#202) or other work surface. The projector is controlled by a personal computer (#302) which could be in communication with a camera (#200) and arranged to capture images of items against a background of the display.
Microsoft's Mysterious Projector System Design
In many ways Microsoft's patent Figure 3 is a mysterious schematic purposely designed to convey a concept without really identifying the parts in a comprehensive way. In one way, the schematic would appear to mirror some of the thinking that's behind MIT's SixthSense project – which we covered back in 2009. Meaning that the projector and personal computer could actually be designed as an all-in-one form factor like a smartphone and not a separate computer as the schematic is illustrating. Remember, this is a concept that Microsoft is conveying, not a form factor patent. So they're playing with the pieces of a concept so as to hide the true identity of their final design.
Other influences regarding future hand gesturing systems were covered in our June 2010 report titled "The Next OS Revolution Countdown Begins." In that report we covered a presentation by John Underkoffler, the man behind the Minority Report movie. More importantly of course, is the fact that Underkoffler is the man behind the revolutionary interface that he developed at MIT called g-speak - Spatial Operating Environment. In that presentation he stated that technology would eventually be built into the bezel of tablets to capture hand gesturing; which is along the lines of what Microsoft's concept is all about.
Underkoffler's presentation about his new hand gesturing system was first presented to the public at a TED conference. In fact, at the end of the presentation (which could be seen at the 13:30 mark of the video presented in our linked report above) the curator of TED, Chris Anderson, begins a brief Q&A session with a question he says was from Bill Gates who was in attendance. The question was about when the project would be ready to bring to market. Whether Gates was asking that specific question knowing full well that they were working on a similar project or whether he was considering an acquisition of Underkoffler's project is unknown at this time.
At the End of the Day
At the end of the day, it was always Microsoft's intent to keep the full scope of this invention a bit of a mystery. If you remember, Microsoft was honest about that from the outset in their opening summary statement. They made it clear that their patent application was only a prelude to a more detailed description that would be presented later.
For now, we could definitely gather that Microsoft will be bringing hand gesturing to a future iteration of Windows/Metro for tablets and "Surface" tables and beyond. The mystery aspect of this patent application surrounds their pico-like projector system and/or device. Whether the latter aspect of this patent will ever see the light of day or remain typical Microsoft vaporware remains to be seen.
Yet if Windows is to remain relevant in a world that has clearly voted for iOS and Android based mobile devices, the next wave of Windows 8 devices better deliver a new wave of cooler, higher-end features that consumers will drool for, because a bunch of Me-Too products won't be enough to turn the market around for them. It's a tall order – but in many ways, it's definitely a do-or-die moment for mobile devices based on Windows.
Microsoft's patent application was originally filed in December 2011 or four months ago and made public by the US Patent and Trademark Office in April.
Notice: Patent Bolt presents a detailed summary of patent applications with associated graphics for journalistic news purposes as each such patent application is revealed by the U.S. Patent & Trade Office. Readers are cautioned that the full text of any patent application should be read in its entirety for full and accurate details. Revelations found in patent applications shouldn't be interpreted as rumor or fast-tracked according to rumor timetables. About Comments: Patent Bolt reserves the right to post, dismiss or edit comments.
Here are a Few Sitescovering our Original Report: MacSurfer, Twitter,Facebook, Google Reader, The Fly on the Wall (Wall Street), BGR, Tom's Hardware, The TechEYE, and more.
The projector could be good for some presentations. I like the idea.
Posted by: MonkeyMo | April 16, 2012 at 06:29 PM
Sounds neat, but with a limited market. I have a hard time imagining why I would prefer to use non-touch gestures to control a tablet with my hands. Why not just touch it? The only thing I can see a use for it would be a way for a deaf person to sign in front of the tablet and have the tablet translate to spoken english (or other language).
Posted by: CatatonicBug | April 16, 2012 at 03:19 PM