Humans are pretty good at looking at a single two-dimensional image and understanding the full three-dimensional scene that it captures. Artificial intelligence agents are not. Yet a machine that needs to interact with objects in the world — like a robot designed to harvest crops or assist with surgery — must be able to infer properties about a 3D scene from observations of the 2D images it’s trained on. While scientists have had success using neural networks to infer representations of 3D scenes from images, these machine learning methods…