Pepper Thu, Oct 26, '17
Far From Child’s Play: How Pepper Mastered the Ball-In-Cup


We wanted to test state-of-the-art robot control learning algorithms on Pepper. We already knew that nice-looking movements were possible by creating them with Choreographe. Our AI lab, began the work by trying to understand whether we could create motor skills (a movement that gets something done, basically). We knew that we probably wouldn’t be able to have Pepper wash the dishes, but at least something involving light objects and a goal would be nice.

To test Pepper’s ability, we went through a process called “dynamic movement primitive” or “DMP”. DMPs have several nice properties (guaranteed stability, scalability, online adaptable to perturbations, etc.), and have been quite popular in the robotics research community. But most importantly, it has been shown that they can be very efficiently optimized.

The process of optimizing is what you see in our video. Our team first (roughly) demonstrates to Pepper how the movement is supposed to go. Note that it’s not necessary to show Pepper the optimal trajectory — something which is kind of close will do the trick. Next, the actual learning begins. We repeat the following steps until we’re happy with what we see:

  • Each of the 10 movements is executed on Pepper (we call that a “rollout”, and an error (basically a number) is calculated for each one of them. To get the error, we measure the distance between ball and cup at the point in time when the ball passes the rim of the cup in a downward motion. The intuition is of course: if the distance between ball and cup is zero, then the ball lands in the cup. So we want this distance, or error, to be minimal. The measurement is done using 2 external cameras: one filming Pepper from the side to check when the ball passes the rim of the cup, and one filming from above to measure the distance.
  • Next, the best rollouts are selected, and our “best guess” is updated to be more similar to these.
  • Repeat!

That’s how we got the “after 10 trials”, “after 20 trials”, “after 30 trials”, and so on. What we show in the video is always the “best guess” after trying out 10 random modifications of the movement and updating our parameter vector.

We also tried this optimization of movement without the external cameras. Instead, the team simply judged how good each one of Pepper’s trials was, on a scale from 1 to 10. A success would be a 10, the ball hitting the cup but jumping out would be a 9, the ball just missing the cup would be an 8, and so on. This also worked very well! So we could easily imagine this kind of optimization to happen at people’s homes: anyone could take Pepper by the hand and demonstrate a movement, and then optimize it by giving Pepper feedback (for example by touching buttons for a 1–10 scale on the tablet). No coding involved!

What motor skill would you most like to see Pepper have in your home?