Modern machine-learning models, such as neural networks, are often referred to as “black boxes” because they are so complex that even the researchers who design them can’t fully understand how they make predictions. To provide some insights, researchers use explanation methods that seek to describe individual model decisions. For example, they may highlight words in a movie review that influenced the model’s decision that the review was positive. But these explanation methods don’t do any good if humans can’t easily understand them, or even misunderstand them. So, MIT researchers created…