To teach an AI agent a new task, like how to open a kitchen cabinet, researchers often use reinforcement learning — a trial-and-error process where the agent is rewarded for taking actions that get it closer to the goal. In many instances, a human expert must carefully design a reward function, which is an incentive mechanism that gives the agent motivation to explore. The human expert must iteratively update that reward function as the agent explores and tries different actions. This can be time-consuming, inefficient, and difficult to scale up,…