Phantom Maelstrom Blog is no longer active. The new address is at

Wednesday, November 9, 2011

Mask-bot - A talking video humanoid robot

Dr. Takaaki Kuratate in conversation with his Mask-bot self.
A team at the Institute of Cognitive Systems (ICS) at TU München and the National Institute of Advanced Industrial Science and Technology (AIST) in Japan has developed a talking video robot face called Mask-bot.

Now, this is not new to its robotic predecessors that are also equipped with amazing Artificial Intelligence (AI), but what sets Mask-bot apart is that it can instantly construct and project a static video image of anyone's face (from a photo) on a 3D surface, and it moves its virtual head a little and raises its eyebrows as you speak, to create the impression that it understands, although it actually doesn't yet.

It also projects an image from behind, making it more realistic, unlike Disney animatronics characters for example, which are projected from the front, and works in daylight. It's also more flexible than existing humanoid robots, which use a complex set of mechanical parts and must be custom-designed.

Avatars for video conferencing

According to Dr. Takaaki Kuratate, Mask-bot could soon be deployed in video conferences. "You can create a realistic replica of a person that actually sits and speaks with you at the conference table. You can use a generic mask for male and female, or you can provide a custom-made mask for each person."

But a more advanced version of Mask-bot doesn't even require a video image of the person speaking. A program can also convert a normal two-dimensional photograph into a correctly proportioned projection for a three-dimensional mask complete with facial expressions and voice. A talking-head animation engine filters an extensive series of face motion data from a variety of people collected by a motion capture system and selects the facial expressions that best match a specific phoneme being spoken. Examples can be found here.

The computer extracts a set of facial coordinates from each of these expressions, which it can then assign to any new face, bringing it to life. Emotion synthesis software then delivers the visible emotional nuances that indicate, for instance, when someone is happy, sad, or angry.

Synthesized voice

An advanced version of Mask-bot is said to also have the ability to reproduce content typed via a keyboard. A text-to-speech system converts text in English, Japanese, and soon German to audio female or male voice, which can be quiet or loud, happy or sad. Mask-bot doesn't actually understand anything; it just listens and makes pretend responses as part of a fixed programming sequence.

Meanwhile, the Munich researchers are working on Mask-bot 2, a mobile version. The mask, projector, and computer control system will all be contained inside a robot costing around EUR 400 (Mask-bot 1 is 3,000 EUR).

"Mask-bot will influence the way in which we humans communicate with robots in the future," predicts Prof. Gordon Cheng, head of the ICS team. "These systems could soon be used as companions for older people who spend a lot of time on their own," says Kuratate.


Post a Comment

Twitter Delicious Facebook Digg Stumbleupon Favorites More

Design by Free WordPress Themes | Bloggerized by Lasantha - Premium Blogger Themes | Powerade Coupons