Artificial Intelligence (AI) has the potential to advance humanity and civilization than any technology that came before it. However, AI carries risks, and heavy responsibilities, with it. DeepMind, owned by Alphabet (Google’s parent company), and OpenAI, a non-profit AI research company, are working to alleviate some of these concerns. They are collaborating with people (who don’t necessarily have any special technical skills themselves) to use human feedback to teach AI. Not only because this feedback helps AI learn more effectively, but also because the method provides improved technical safety and control.
Not only because this feedback helps AI learn more effectively, but also because the method provides improved technical safety and control.
Among the first collaboration conclusions: AI learns by trial and error, and doesn’t need humans to give it an end goal. This is good, because we already know that setting a goal that’s even a little off can have disastrous results. In practice, the system used feedback to learn how to make a simulated robot do backflips.
The system is unusual because it learns by training the “reward predictor,” an agent from a neural network, instead of collecting rewards as it explores an environment. A reinforcement learning agent still explores the environment, but the difference is that clips of its behavior are then sent to a human periodically. That human then chooses the better behavior based on whatever the ultimate goal is. It’s those human selections that train the reward predictor, who in turn trains the learning agent. Finally, the learning agent eventually learns how to improve its behavior enough to maximize its rewards — which it can only do by pleasing the human.
This approach allows humans to detect and correct any behaviors that are undesirable, which ensures safety without being too burdensome for human stewards. That’s a good thing, because they need to review about 0.1% of the agent’s behavior to teach it. That may not seem like much at first, but that could well mean thousands of clips to review — something the researchers are working on.
Human feedback can also help AI achieve superhuman results — at least in some video games. Researchers are now parsing out why the human feedback system achieves wildly successful results with some tasks, average or even ineffective results with others. For example, no amount of human feedback could help the system master Breakout or Qbert. They are also working to fix the problem of reward hacking, in which early discontinuation of human feedback causes the system to game its reward function for bad results.
Understanding these problems is essential to building AI systems that behave as we intend them to — safely and effectively. Other future goals may include reducing the amount of human feedback required, or changing the way it’s provided; perhaps eventually facilitating “face to face” exchanges that offer the AI more opportunities to learn from actual human behavior.
Editor’s Note: This article has been updated to note the contributions made by OpenAI.