Researchers unveil ‘Humanity’s Last Exam,’ it's so difficult that today’s AI systems consistently fail it

Humanity’s Last Exam (HLE) is a groundbreaking 2,500-question assessment created to reveal the limits of advanced AI systems. Developed by almost 1,000 researchers worldwide, including Texas A&M professor Dr. Tung Nguyen, it covers mathematics, natural sciences, humanities, ancient languages, and specialized subfields. Unlike traditional benchmarks, HLE needs deep human expertise and context, leaving even leading AI models struggling with accuracy under 50%.