LLM Benchmarks - Search News

LiveBench is an open LLM benchmark that uses contamination-free test data and objective scoring

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now A team of Abacus.AI, New York University, ...

VentureBeat

Beyond generic benchmarks: How Yourbench lets enterprises evaluate AI models against actual data

Every AI model release inevitably includes charts touting how it outperformed its competitors in this benchmark test or that evaluation matrix. However, these benchmarks often test for general ...

IEEE Spectrum on MSN

Why are large language models so terrible at video games?

AI models code simple games, but struggle to play them ...

Geeky Gadgets

AI Benchmarks Are Broken : The Leaderboard Illusion

What if the tools we trust to measure progress are actually holding us back? In the rapidly evolving world of large language models (LLMs), AI benchmarks and leaderboards have become the gold standard ...

Forbes

IBM’s New Granite 3.0 AI Models Show Strong Performance On Benchmarks

Forbes contributors publish independent expert analyses and insights. Paul-Smith Goodson is an analyst covering quantum computing and AI. IBM just announced a new collection of AI models, its third ...

Forbes

Beyond The Llama Drama: 4 New Benchmarks For Large Language Models

Forbes contributors publish independent expert analyses and insights. AI researcher working with the UN and others to drive social change. Apr 13, 2025, 07:56pm EDT The April 2025 drama around Llama's ...

MacStories

The AI App Experience Matters More Than Benchmarks Now

I was catching up on different articles after the release of Claude Opus 4.5 earlier this week, and this part from Simon Willison’s blog post about it stood out to me: I’m not saying the new model isn ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results