FORTUNE

AI agents are getting more capable, but reliability is lagging—and that’s a problem

Most AI vendors don't benchmark for reliability. A new benchmark from Princeton researchers does.