Latest
AI
Amazon
Apps
Biotech & Health
Climate
Cloud Computing
Commerce
Crypto
Enterprise
EVs
Fintech
Fundraising
Gadgets
Gaming
Government & Policy
Hardware
Layoffs
Media & Entertainment
Meta
Microsoft
Privacy
Robotics
Security
Social
Space
Startups
TikTok
Transportation
Venture
Events
Startup Battlefield
StrictlyVC
Newsletters
Podcasts
Videos
Partner Content
TechCrunch Brand Studio
Crunchboard
Contact Us
Apple has revealed updates to the AI models that power its suite of Apple Intelligence features across iOS, macOS, and more. But according to the company’s own benchmarks, the models underperform older models from rival tech firms, including OpenAI.
Apple said in a blog post Monday that human testers rated the quality of text generated by its newest “Apple On-Device” model — which runs offline on products, including the iPhone — “comparably” to, but not better than, text from similarly sized Google and Alibaba models. Meanwhile, those same testers rated Apple’s more capable new model, which is called “Apple Server” and is designed to run in the company’s data centers, behind OpenAI’s year-old GPT-4o.
In a separate test evaluating the ability of Apple’s models to analyze images, human raters preferred Meta’s Llama 4 Scout model over Apple Server, according to Apple. That’s a bit surprising. On a number of tests, Llama 4 Scout performs worse than leading models from AI labs like Google, Anthropic, and OpenAI.
The benchmark results add credence to reports suggesting Apple’s AI research division has struggled to catch up to competitors in the cutthroat AI race. Apple’s AI capabilities in recent years have underwhelmed, and a promised Siri upgrade has been delayed indefinitely. Some customers have sued Apple, accusing the firm of marketing AI features for its products that it hasn’t yet delivered.
In addition to generating text, Apple On-Device, which is roughly 3 billion parameters in size, drives features like summarization and text analysis. (Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.) As of Monday, third-party developers can tap into it via Apple’s Foundation Models framework.
Apple says both Apple On-Device and Apple Server boast improved tool use and efficiency compared to their predecessors, and can understand around 15 languages. That’s thanks in part to an expanded training dataset that includes image data, PDFs, documents, manuscripts, infographics, tables, and charts.
Topics
AI Editor
From seed to Series C and beyond—founders and VCs of all stages are heading to Boston. Be part of the conversation. Save $200+ now and tap into powerful takeaways, peer insights, and game-changing connections.
What makes TechCrunch All Stage different from other startup events? Answers to your most pressing questions
Snap plans to sell lightweight, consumer AR glasses in 2026
Final call: Apply to host a Side Event at TechCrunch All Stage 2025 today
Threads is finally getting a DM inbox
Apple’s upgraded AI models underwhelm on performance
Enterprise AI startup Glean lands a $7.2B valuation
The US Navy says ‘welcome aboard’ to new startup partnerships
© 2025 TechCrunch Media LLC.