Apple’s upgraded AI models underwhelm on performance

Latest

Amazon

Apps

Biotech & Health

Climate

Cloud Computing

Commerce

Crypto

Enterprise

EVs

Fintech

Fundraising

Gadgets

Gaming

Google

Government & Policy

Hardware

Instagram

Layoffs

Media & Entertainment

Meta

Microsoft

Privacy

Robotics

Security

Social

Space

Startups

TikTok

Transportation

Venture

Events

Startup Battlefield

StrictlyVC

Newsletters

Podcasts

Videos

Partner Content

TechCrunch Brand Studio

Crunchboard

Apple has revealed updates to the AI models that power its suite of Apple Intelligence features across iOS, macOS, and more. But according to the company’s own benchmarks, the models underperform older models from rival tech firms, including OpenAI.

Apple said in a blog post Monday that human testers rated the quality of text generated by its newest “Apple On-Device” model — which runs offline on products, including the iPhone — “comparably” to, but not better than, text from similarly sized Google and Alibaba models. Meanwhile, those same testers rated Apple’s more capable new model, which is called “Apple Server” and is designed to run in the company’s data centers, behind OpenAI’s year-old GPT-4o.

In a separate test evaluating the ability of Apple’s models to analyze images, human raters preferred Meta’s Llama 4 Scout model over Apple Server, according to Apple. That’s a bit surprising. On a number of tests, Llama 4 Scout performs worse than leading models from AI labs like Google, Anthropic, and OpenAI.

The benchmark results add credence to reports suggesting Apple’s AI research division has struggled to catch up to competitors in the cutthroat AI race. Apple’s AI capabilities in recent years have underwhelmed, and a promised Siri upgrade has been delayed indefinitely. Some customers have sued Apple, accusing the firm of marketing AI features for its products that it hasn’t yet delivered.

In addition to generating text, Apple On-Device, which is roughly 3 billion parameters in size, drives features like summarization and text analysis. (Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.) As of Monday, third-party developers can tap into it via Apple’s Foundation Models framework.

Apple says both Apple On-Device and Apple Server boast improved tool use and efficiency compared to their predecessors, and can understand around 15 languages. That’s thanks in part to an expanded training dataset that includes image data, PDFs, documents, manuscripts, infographics, tables, and charts.

Topics

AI Editor

From seed to Series C and beyond—founders and VCs of all stages are heading to Boston. Be part of the conversation. Save $200+ now and tap into powerful takeaways, peer insights, and game-changing connections.

What makes TechCrunch All Stage different from other startup events? Answers to your most pressing questions

Snap plans to sell lightweight, consumer AR glasses in 2026

Final call: Apply to host a Side Event at TechCrunch All Stage 2025 today

Threads is finally getting a DM inbox

Apple’s upgraded AI models underwhelm on performance

Enterprise AI startup Glean lands a $7.2B valuation

The US Navy says ‘welcome aboard’ to new startup partnerships

Apple’s upgraded AI models underwhelm on performance

Related Articles