In recent months, researchers at OpenAI have been focusing on developing artificial intelligence (AI) that learns better. Their machine learning algorithms are now capable of training themselves, so to speak, thanks to the reinforcement learning methods of their OpenAI Baselines. Now, a new algorithm lets their AI learn from its own mistakes, almost as human beings do.
The development comes from a new open-source algorithm called Hindsight Experience Replay (HER), which OpenAI researchers released earlier this week. As its name suggests, HER helps an AI agent “look back” in hindsight, so to speak, as it completes a task. Specifically, the AI reframes failures as successes, according to OpenAI’s blog.
“The key insight that HER formalizes is what humans do intuitively: Even though we have not succeeded at a specific goal, we have at least achieved a different one,” the researchers wrote. “So why not just pretend that we wanted to achieve this goal to begin with, instead of the one that we set out to achieve originally?”
Simply put, this means that every failed attempt as an AI works towards a goal counts as another, unintended “virtual” goal.
Think back to when you learned how to ride a bike. On the first couple of tries, you actually failed to balance properly. Even so, those attempts taught you how to not ride properly, and what to avoid when balancing on a bike. Every failure brought you closer to your goal, because that’s how human beings learn.
With HER, OpenAI wants their AI agents to learn the same way. At the same time, this method will become an alternative to the usual rewards system involved in reinforcement learning models. To teach AI to learn on its own, it has to work with a rewards system: either the AI reaches its goal and gets an algorithm “cookie” or it doesn’t. Another model gives out cookies depending on how close an AI is to achieving a goal.
Both methods aren’t perfect. The first one stalls learning, because an AI either gets it or it doesn’t. The second one, on the other hand, can be quite tricky to implement, according to the IEEE Spectrum. By treating every attempt as a goal in hindsight, HER gives an AI agent a reward even when it actually failed to accomplish the specified task. This helps the AI learn faster and at a higher quality.
“By doing this substitution, the reinforcement learning algorithm can obtain a learning signal since it has achieved some goal; even if it wasn’t the one that you meant to achieve originally. If you repeat this process, you will eventually learn how to achieve arbitrary goals, including the goals that you really want to achieve,” according to OpenAI’s blog.
Here’s an example of how HER works with OpenAI’s Fetch simulation.
This method doesn’t mean that HER makes it completely easier for AI agents to learn specific tasks. “Learning with HER on real robots is still hard since it still requires a significant amount of samples,” OpenAI’s Matthias Plappert told IEEE Spectrum.
In any case, as OpenAI’s simulations demonstrated, HER can be quite helpful at “encouraging” AI agents to learn even from their mistakes, pretty much as we all do — the major difference being that AIs don’t get frustrated like the rest of us feeble folks.