In his poem Metamorphoses, Ovid tells the story of Pygmalion, a sculptor who carves a woman so beautiful that he falls in love with her. As Ovid describes the statue, who Pygmalion named Galatea, he says that she is so exquisitely rendered that, “you would think it was alive.”
Ars adeo latet arte sua. This translates roughly to, “so thoroughly the artifice is hidden by the artistry.” This idea of artful artifice, as first outlined by Ovid, is often in my mind as I think about the work we’ve done with Pepper at SoftBank Robotics. While we have come a long way, there are still steps to take in our pursuit of a real life Galatea.
To Communicate With Robots, You Need To Think Like a Robot
One of the biggest reasons even the most successful robots are still seen as “robotic” is their communication style. No matter how much variety you program into a robot, their conversation still relies on the successful execution of a set of programmed responses to a set of predetermined questions and topics. If you want to talk with a robot, you must speak slowly, with specific patterns of speech and minimal ambiguity.
In other words, to talk with a robot, you must think like a machine.
If a robot has been coded to direct customers to find specific items in the grocery store, you could imagine an interaction where someone asks, “Where is the chicken?” If the robot replies, “the chicken is in the back of the store,” it has technically succeeded. But what if the customer asks, “is the chicken next to where you pick up prescriptions?” Well, now we’ve gone off-script, and the robot is stumped.
Today, communication with robots requires coders to anticipate every interaction, then devise paths for appropriate responses. This works well in situations where there is a set of planned interactions, which is why Pepper works so well as a greeter, a responder to frequently asked questions, a game-player, or an attention-getting host.
As we advance in our quest to make Pepper true-to-life, we have to consider how to build on the progress we’ve made.
The Progression of Communication Capabilities: From Texting To Real LIfe
Understanding the complications of bringing a humanoid to life requires an examination of existing bots; namely, chatbots, and voice-controlled assistants.
After decades of use, chatbots are finally becoming mainstream. (Recently Ernst & Young said that 80% of all companies will be invested in a chatbot strategy by 2020.)
There are a few reasons why chatbots have been the first to reach high levels of maturity and adoption. Text communication is a simple mode of communication, and one that robots are used to. And not only is text a fairly controlled and standardized input, but the medium also provides cover for processing lags. When you are communicating over text you expect delayed responses, so gaps in the communication can still fly under the radar and feel natural.
Upon introducing the variable of voice, the challenges that come with maintaining the stagecraft of humanity are exposed. Consider what goes into making Siri or Alexa work; sophisticated audio signal processing (so the bot ignores inputs like dogs barking, or background speech), natural language processing (to transcribe the words within that signal and decide what the speaker intended), pattern-to-action mappings, and response formulation, which is based on an understanding of the initial input.
And then on top of that, you need the ability to turn the text back into speech, and deliver that speech with a natural intonation o the audience. That’s quite a feat, and we aren’t even talking about robots with a physical body.
Google’s Duplex demo, which showcased a robot assistant speaking in a completely natural way when booking an appointment for a client, is a good indicator that this medium is making rapid progress, but there is still a ways to go before voice bots deliver a truly lifelike experience.
Pepper and The Quest for Life
Because Pepper is a physical robot, creating language for her requires a completely different approach than you would take with a purely voice-driven bot like Siri or Alexa. While this presents significant challenges, it also is very powerful, because it allows us--and forces us--to replicate what true communication looks like in the real world.
Real world communication is particularly tricky to replicate in a robot because a huge portion our communication is non-verbal--you express meaning with facial expressions, posture, physical proximity, and tone of your voice, not just words that come out of your mouth. When we think about conversations for Pepper, not only do we think about the words, but we also think about the accompanying gestures, the sentiment, the facial expressions, and the ability to respond to ambiguity.
While the amount of coordination that has already come together is incredible, we are continually looking for ways to ensure Pepper interacts with users in a more natural way.
We believe that taking Pepper to the next level will be dependant on our ability to make her conversations “real.” In other words, as we help Pepper nimbly detect and adapt to shifting contexts, to represent the world and reason about the things and people in it, we will come closer to delivering a humanoid robot that is truly an artful intelligence.