In just a few weeks, soccer-playing robots from around the world will converge on Eindhoven in the Netherlands to compete for the prestigious RoboCup 2013. With around 2,500 particpants, competition is sure to be tough - just have a look at the final from last year’s RoboCup in Mexico City.
As you can see, the humanoid robots there are pretty advanced. So any technological advantage that gives a team the upper hand is a worthy goal indeed.
In the months since my last Conversation article, regarding soccer playing robots, the field of humanoid robotics has taken some major steps forward. These steps take us ever closer to the future portrayed by Hollywood, in which robots are as common as fridges or laptops.
Every day, teams across the world develop humanoid robots capable of performing disaster-rescue missions. Far above Earth, NASA’s “Robonaut 2” performs mundane and dangerous tasks on the International Space Station, ensuring the safety of its human comrades.
One of the biggest challenges in the field of robotics is to take behaviours that humans perform subconsciously, and to translate them into machinery and code.
Self-localisation - from humans to robots
Imagine that you awake in a pitch-black room, with no memory of how you arrived (much like the plot of many low-budget horror flicks).
As soon as the lights come on you would look around, process your surroundings and draw some conclusions as to where you actually are. Perhaps you are in the middle of the room, or in the corner, and perhaps you are directly opposite the door leading to your freedom.
Through these observations of features in your environment, you have determined your position relative to the door, which is information that you require in order to leave. This is the process of “self-localisation”, and is something that a human would do instantly and subconsciously.
To separate the concept of self-localisation from “mapping”, consider a simpler scenario in which you are given a map of the room in advance.
You are able to inspect the map to identify the key features (such as the door or telephone), but have no idea where in the room you will awake. Upon awakening, you can immediately search for these features (ignoring all else), again assisting with your escape.
Solving this problem without a map leads to an area of active research called “simultaneous localisation and mapping” (SLAM). But the simpler scenario (in which a map is provided) is still incredibly complex to implement on an autonomous humanoid robot, and involves the following major steps:
- Capturing images of the environment using a camera (often a simple webcam), and processing the pixels to identify key features. In the scenario of robot soccer, these include static features like goal posts and field lines, as well as moving features like the ball
- Estimating the distance and orientation of these key features, relative to the robot
- Combining information from the current observation from the previous “positional belief”, which includes information based on previous observations, as well as estimation by the robot of how far it has moved in that time period. The combination of evidence and prior belief is performed by a “probabilistic filter” and is illustrated in the image below.
Each of these major steps has been the focus of much research. But we recently realised that a major question had been overlooked: where should the robot be looking? More concretely, if you wake in a pitch-black room and the lights are switched on, what is the quickest and most efficient way to look around the room to determine where you are?
Given the state of and excellent results of modern machine learning research, the solution was clear – let the robot work it out for itself!
Motivated reinforcement learning
So what’s the best way for a robot to learn an inherently human trait? This is a complex and unsolved question, but there is intuition in trying to best replicate the process by which a human would learn.
For this work, so-called “motivated reinforcement learning” was implemented, which combines traditional reinforcement learning with motivation theory.
Simply put, reinforcement learning is a framework by which an agent (such as the robot) can determine an optimal policy (or sequence of actions) to maximise expected reward within some environment (the soccer field). In this scenario, the problem is structured so the robot determines a policy of head motions that minimise self-localisation error.
Motivated reinforcement learning extends this framework by having the robot seek out “similar-but-different” experiences, when it cannot otherwise determine an optimal action. This behaviour is based on the idea that animals seek “novelty of sensation”, and has been previously applied to studying the progression of architectural designs.
Taking a step back, 2.0
As mentioned in my previous article, science favours simplicity, and there is often merit in taking a step back from the textbook methodology for solving a well-known problem.
The problem of self-localisation is no exception, with an optimised head actuation policy shown to yield an 11% improvement in self-localisation performance.
Although this may not sound like much, recent research on the improvement of self-localisation by improving probabilistic filters (mentioned earlier) has demonstrated that improving self-localisation accuracy by just 5cm translates to a 14% improvement in goals scored (over 750 games).
And that might just be the difference needed to win us the cup.