The researchers discovered that models that aced science quizzes struggled with real laboratory situations requiring quick adaptation.