The illusion of thinking: Apple’s new AI reasoning study
2 min read 3 days ago
Apple’s latest research, “The Illusion of Thinking”, challenges the idea that AI models like GPT, Claude, and Gemini can actually “reason.” According to Apple researchers, instead of true thinking, these models are simply very good at pattern matching.
The experiment:
- Apple’s experiment compared a standard language model (DeepSeek-V3) to a thinking model (DeepSeek-R1) on puzzles like Tower of Hanoi.
- Both models do well with easy tasks, and the reasoning model handles medium-level problems better. But once the difficulty spikes, both models completely fail, even when the reasoning model tries to think harder.
Key findings:
- AI struggles with hard problems: These models perform well on easier puzzles but completely fail on harder ones. They can’t handle complex problems, revealing that they lack real problem-solving skills.
- They think less when things get harder: As problems get harder, AI models actually reduce their thinking time, often giving up and relying on previous guesses, highlighting a fundamental flaw in scaling reasoning.
- They struggle to follow explicit instruction: Even when given the exact steps to solve a problem, these AIs still hit the same complexity wall, showing that they don’t truly understand algorithms.
- Their reasoning is inconsistent: The models often perform erratically, solving a difficult problem well but failing simple ones, which suggests they rely on memorized patterns rather than consistent logic.
- They’re just mimicking patterns, not genuinely reasoning.: The models may appear to reason, but they’re simply pattern-matching based on past data. When faced with unfamiliar problems, their so-called reasoning falls apart.
Conclusion
Apple is pointing out the limits of today’s AI, especially when it comes to tackling complex tasks, and is warning against buying into the hype around their reasoning abilities. Basically, they’re saying we need to reset our expectations and realise that true AI reasoning, or even AGI, will need much more than just bigger models.