PYMNTS Intelligence Banner June 2024

Anthropic Named ‘Best Performing’ LLM as AI Arms Race Intensifies

Anthropic, Claude, AI

Generative artificial intelligence (AI) firm Galileo has released a new ranking of top large language models (LLMs).

The company on Monday (July 29) announced its latest “Hallucination Index,” which ranks the performance of AI LLMs from the likes of OpenAI, Anthropic, Google and Meta.

“This year’s Index added 11 models to the framework, representing the rapid growth in both open- and closed-source LLMs in just the past 8 months,” the company said in a news release. “As brands race to create bigger, faster and more accurate models, hallucinations remain the main hurdle to deploying production-ready Gen AI products.”

The best overall performing model, the index found, was Anthropic’s Claude 3.5 Sonnet, which Galileo said surpassed its rivals in short, medium, and long context scenarios, beating out last year’s winners, OpenAI’sGPT-4o and GPT-3.5.

The ranking chose Google’s Gemini 1.5 Flash as the best performing on cost, and Alibaba’s Qwen2-72B-Instruct as the best performing open source model.

“In today’s rapidly evolving AI landscape, developers and enterprises face a critical challenge: how to harness the power of generative AI while balancing cost, accuracy and reliability. Current benchmarks are often based on academic use-cases, rather than real-world applications,” said Vikram Chatterji, CEO and co-founder of Galileo.

“Our new Index seeks to address this by testing models in real-world use cases that require the LLMs to retrieve data, a common practice in enterprise AI implementations,” he added. “As hallucinations continue to be a major hurdle, our goal wasn’t to just rank models, but rather give AI teams and leaders the real-world data they need to adopt the right model, for the right task, at the right price.”

In other news from the AI arms race, PYMNTS wrote Monday about the recent launch of OpenAI’s new “SearchGPT” tool, which aims to disrupt the eCommerce industry online with AI-powered product discovery and comparison.

The prototype claims to offer more concise and relevant results than conventional search engines, and — unlikeGoogle’s pages of links — offers summarized answers with source links, which could transform how consumers discover and purchase products online.

“While the fundamentals of the first search are not hugely different, it is the ability to follow up on your first query that really extends the capabilities of SearchGPT,” Alex Moran, SEO lead at the marketing agency Space & Time, told PYMNTS.

“This is giving OpenAI’s new product a much more direct approach towards what has been a big part of Google’s aim for years: the ability to complete the whole user journey directly on their platform. Google has been subtly pushing this for years, including featured snippets and the ability to book or reserve hotels or restaurants directly on their platform,” Moran added.