TechingToday

Claude tops AI chatbot rankings - GPT-4 finally trails in second place

General

Anthropic's next-generation artificial intelligence model, Claude 3 Opus, took the top spot on the Chatbot Arena leaderboard, pushing OpenAI's GPT-4 into second place for the first time since the start of last year. [LMSYS Chatbot Arena, unlike other AI model benchmarks, relies on human votes.

The various GPT-4 versions of OpenAI have held the top spot for so long that other models close to their benchmark scores are known as GPT-4 class models. Future rankings may require the introduction of a new Claude-3 class model.

It is worth noting that the Claude-3 Opus and GPT-4 scores are very close, and a "significantly different" GPT-5 is expected at some point this year, a year after the OpenAI model was introduced.

The chatbot arena is run by LMSys, the Large Model Systems Organization, where a wide variety of large language models fight in anonymous randomized battles.

It was first launched last May, during which time models from Anthropic, OpenAI, and Google dominated most of the top 10, garnering over 400,000 user votes.

More recently, open-source models have become increasingly present, with models from Chinese companies such as French AI startup Mistral and Alibaba also topping the list.

The Elo rating system, widely used in games such as chess, is used to calculate a player's relative skill level. Unlike chess, this ranking applies to chatbots and not to the humans using the model.

The arena has limitations, not all models or versions of models are included, and GPT-4 models may not be loaded.

The arena also does not include well-known models such as Google's Gemini Pro 1.5 and Gemini Ultra, which have huge context windows.

More than 70,000 new votes made up the latest update, with the Claude 3 Opus topping the leaderboard, but even the smallest of the Claude 3 models fared well.

LMSYS explains: "The Claude-3 Haiku impressed all, reaching the GPT-4 level according to user preference! Its speed, capability, and length of context are currently unmatched in the market"

What makes this even more impressive is that the Claude 3 Hike is a "local size" model comparable to Google's Gemini Nano. It achieves great results without the huge parameter scales of over a trillion like the Opus and GPT-4 class models.

While not as intelligent as Opus or Sonnet, Anthropic's Haiku is considerably cheaper, much faster, and, as the arena results suggest, comparable to much larger models in blind tests.

All three Claude 3 models are in the top 10, with Opus in the top spot, Sonnet in joint 4th with Gemini Pro, and Haiku in joint 6th with an early version of GPT-4.

Of the top 20 large language models on the Arena leaderboard, all but three are proprietary, suggesting that open source has some challenges in reaching the big players.

Meta, which is focusing on open source AI, will release Llama 3 in the coming months, which is expected to have similar capabilities to Claude 3 and will likely make the top 10.

There have also been other developments in open source and distributed AI, such as Emad Mostaque, founder of StabilityAI, stepping down from his CEO position to focus on more distributed and accessible artificial intelligence. He stated that more centralized AI cannot beat centralized AI.