TechingToday

Anthropic Announces Claude 3 - New AI Model Leaves OpenAI's GPT-4 Behind

General

Claude 3 is the latest large-scale language model from AI lab Anthropic, and promises to give both OpenAI's GPT-4 and Google's Gemini a shot in the arm.

The latest generation of popular chatbot and AI models includes three versions of Opus, Sonnet, and Haiku, each serving a different market and purpose.

In benchmarking, Anthropic said the Claude 3 Opus, the largest of the models, beat the GPT-4 and Gemini 1.0 Ultra in undergraduate-level knowledge, graduate-level reasoning by a considerable margin, and elementary math.

Opus is a paid version of the Claude chatbot, Sonnet is a free version, and Haiku is an inexpensive model offered to third-party developers; Sonnet is also available on AWS Bedrock and can be used by companies and developers in their applications.

Claude was one of the early AI chatbots, launched shortly after OpenAI's ChatGPT, and offered a similar experience; while Anthropic's primary focus was on the enterprise market, Claude was always a useful alternative.

In addition to being a chatbot, Claude 3 is available as an API for third-party developers to create their own applications; Opus and Sonnet are available in 159 countries, and Haiku is coming soon.

I have been using Claude since the first version and find it more creative than ChatGPT or Gemini. There are some problems with hallucinations and rejection, but Anthropic says they are resolved in version 3.

This is the first time Claude has been able to see more than text and code. It has become a visual model that allows one to see pictures, charts, graphics, and technical diagrams.

It outperformed Claude 3 in mathematics and reasoning when analyzing images. Interestingly, Sonnet's medium model outperformed Opus, GPT-4V, and Gemini Ultra in understanding scientific diagrams.

However, in almost all popular benchmarks, Claude 3 comfortably beat both GPT-4 and Gemini Ultra.

Opus, the most popular model, is "leading the frontiers of general intelligence by demonstrating human-like levels of comprehension and fluency in complex tasks," the company declared, implying that it is approaching artificial general intelligence.

It should be noted that this is in comparison to OpenAI's year-old model, which GPT-5 expects to drop at some point this year.

"We do not believe that model intelligence is approaching its limits, and we plan to release frequent updates to the Claude 3 model family in the coming months," Anthropic said in a statement.

While Google garnered a lot of attention with the potential 10 million token context window (memory of one chat) of Gemini Pro 1.5, Claude was the first to pass the 100,000 token barrier.

The new model will have a 200,000 token context window at launch, but Anthropic is testing up to 1 million tokens in-house and with testers.

However, having a large context only makes sense if it is accompanied by the ability to recall information from any point in that window. This is what Google worked on with Gemini Pro 1.5.

Anthropic, for its 'Needle In A Haystack' (NIAH) evaluation, which Google also used, stated that Claude 3 Opus achieved near perfect recall with 99% accuracy and identified limitations in the evolutionary process by finding the "needle" sentence itself He states.

Since its launch, Anthropic has paid close attention to the safety of its AI. This includes tracking the safety level of models, highlighting potential risks, and creating a novel "constitution" to keep AI in line with human values.

"As we push the limits of AI capabilities, we are equally committed to ensuring that our safety guardrails keep pace with these leaps in performance," said an Anthropic spokesperson.

"Our hypothesis is that being at the forefront of AI development is the most effective way to steer a trajectory that has positive social consequences."