BLOG & RESOURCES
We Don’t Know What Intelligence Is — So What Exactly Are We Scaling?
I spent nearly two years in medical school before I ended up in IT. That detour taught me one thing above everything else: the human body is extraordinarily complex and we understand far less of it than medicine’s confidence suggests. Anesthesiologists have been reliably putting people to sleep for over a century and we still do not have a complete mechanistic explanation for how general anesthesia works. We know it does. We largely know how to use it safely. The exact why remains genuinely unsettled.
I think about that a lot when I read AI coverage.
The dominant narrative right now is that artificial intelligence is advancing rapidly toward something like humanlike cognition, and that the main variable separating us from that outcome is scale (more parameters, more compute, more data). The implication is that intelligence is a threshold we can engineer our way across if we just build a bigger machine.
“We have never produced a working definition of intelligence that holds up under serious scientific scrutiny. We have proxies.”
But here is the foundational problem with that argument: we have never produced a working definition of intelligence that holds up under serious scientific scrutiny. We have proxies. IQ tests measure certain cognitive tasks. Turing tests measure conversational plausibility. Benchmarks measure benchmark performance. None of these are intelligence. They are observable correlates of whatever intelligence actually is, and in the case of large language models (LLMs), they are increasingly correlates that the systems are optimized to perform well on specifically because we measure them.
Neuroscience, after decades of increasingly sophisticated brain imaging, still cannot fully explain how memory consolidation works during sleep. We do not have a rigorous scientific account of how a three-year-old acquires the concept of “fairness” from watching other children play. Especially when my kids were young, LOL. We debate whether consciousness is a product of computation, of specific biological architecture, or of something else entirely. These are not fringe questions being asked by skeptics with an agenda. They are open problems in mainstream cognitive science and philosophy of mind.
So when a company announces that its newest model has achieved human-level performance on a reasoning benchmark, what exactly has been demonstrated? That the model is good at the benchmark. Which is useful. Benchmarks are useful. But “good at the benchmark” and “intelligent” are not the same claim, and the financial incentives in the industry run very strongly toward conflating them.
None of this means AI is not impressive or genuinely useful. It is both. What I do in my work with clients involves AI tools regularly, and I have seen them deliver real value, not because they are intelligent, but because they are exceptionally good at pattern recognition and text generation at a speed and scale that humans cannot match. That is capability. It deserves honest framing, not mythology.
The mythology is where I get off the train. When the pitch shifts from “this tool will make your team more efficient” to “this system understands your business,” we have left the world of engineering claims and entered marketing. And the distinction matters, because companies making decisions based on the mythology rather than the capability tend to make expensive mistakes.
The next time someone tells you a model is approaching general intelligence, ask them to define intelligence first. Not as a gotcha. As a genuine prerequisite for the conversation. If we cannot agree on what the destination is, we should probably be careful about declaring we are almost there.
We have not solved intelligence. We have built very impressive calculators that speak fluent English. Both things can be true simultaneously, and only one of them requires intellectual humility to say out loud.
RETURN TO BLOG & RESOURCES