In preparation for teaching, I am reading a famous article in AI research, “The Bitter Lesson” written by Richard Sutton in 2019. I wondered what would be prescient and what Sutton could have done wrong. Finally, let’s discuss the economic impact.
Citing decades of AI history, Sutton argues that researchers have learned some “bitter” truths. Researchers have repeatedly assumed that computers’ reliance on human expertise will be the next advance in intelligence. Recent history shows that methods that scale computationally are superior to methods that rely on human expertise. For example, in computer chess, brute force search on specialized hardware has won over knowledge-based approaches. Sutton warns that researchers are reluctant to learn this lesson because they feel satisfaction in incorporating knowledge, but that real breakthroughs will come from the relentless scaling of computation. Scaling in AI means making the model bigger and using more compute to train it on more data.
The “bitter lesson” is not about a single algorithm, but about intellectual humility. Advances in AI come from accepting that persistently expanded general-purpose learning outperforms the best attempts to hard-code intelligence. It is important whether Mr. Sutton’s assertion is right or wrong. Because we are not at the end of the AI explosion or what Dwarkesh Patel has dubbed the “age of scaling.”
EconTalk’s guests speculate about whether AI will save the world or kill us all. See below.
Such extreme predictions assume that AI capabilities will advance. AI has advanced rapidly since Sutton wrote in 2019, but there is no law of nature (as far as we know) that says AI must continue to improve. At times, some even argue that AI capabilities have plateaued or point out that even advanced models continue to exhibit hallucinations.
If scaling is indeed the path to more intelligence, then we can expect AI to continue to exceed expectations as we add more hardware to our systems. This hypothesis has been verified. U.S. private AI investment could exceed $100 billion annually, making it one of the largest technological bets in history. Let’s consider Sutton’s paper in the light of recent work.
I can point to three pieces of evidence that Sutton was right about scaling. First, gameplay AI provides a clean natural experiment. AlphaZero learned chess and Go through self-play, without the need for human intervention or strategy. AlphaZero outperforms previous systems built on domain expertise. As Sutton predicted, its success was driven by scale and calculation.
Second, natural language processing (NLP), the field of AI focused on enabling computers to understand and generate human language, exhibits the same pattern. Previous NLP systems emphasized language-based rules and symbolic structures. OpenAI’s GPT-3 and its successors rely on a general-purpose architecture trained on vast amounts of data with massive compute. Performance scales more reliably than architectural cleverness.
A third example is computer vision. When convolutional neural networks (a type of AI architecture loosely inspired by the visual cortex and designed to automatically learn visual patterns from data) could be trained at scale, they replaced hand-designed feature pipelines (techniques in which programmers manually design algorithms to detect edges and shapes). Accuracy improved as datasets and computing increased.
Sutton’s argument is about the scalability of the method, but in reality, scalability becomes apparent only when computational constraints are relaxed through capital investment.
The speed of advancement in AI reflects not only its technological potential but also the unprecedented mobilization of capital. The average person who uses ChatGPT to create a grocery list may not know what the term “scaling” means. A possible reason for underestimating the rate of progress is not only a misunderstanding of the technology, but also a misestimation of the amount of money spent on that technology.
Let’s compare this to the Manhattan Project. People were suspicious of the Manhattan Project not because it violated physics, but because it was too expensive. Niels Bohr was quoted as saying that this would require “turning the whole country into a factory.” But we did it. I’m doing it again. We are turning this country into an AI factory. Without that investment, progress will be even slower.
But if we are nearing the limits of either the power of scaling or our ability to physically continue to scale, neither the catastrophic nor the utopian is correct. Can this bitter lesson help us as we look beyond 2026? This is as important for today’s unemployment as it is for tomorrow’s existential threats.
Recent economic research offers a mixed picture. Economist Joshua Gans developed a model for “artificial jagged intelligence” in a January 2026 paper. Gans observes that generative AI systems perform evenly across tasks that appear to be “nearby.” This means that a slight change in wording or context can be great for one prompt and confidently wrong for another. Anyone who has used ChatGPT to help with work and seen it hallucinate plausible lies has experienced this jaggedness firsthand.
What makes Gans’s analysis economically interesting is its treatment of scaling laws. In his model, as scale (represented by the density of known points in the knowledge landscape) increases, the average gap decreases and the average quality increases approximately linearly. This is good news for Sutton’s paper. As the amount of compute increases, average performance also increases. However, the jagged edges remain, and the errors remain. Scaling improves average performance without eliminating surprises or long-tail failures.
Gans sees the introduction of AI as an information problem. Users care about local reliability (whether the AI can help them with their tasks), but typically only observe coarse global quality signals (benchmark scores). This discrepancy creates real economic friction. A legal assistant might trust AI to perform admirably on 95% of contract reviews, only to be blindsided by confidently giving wrong answers on seemingly routine clauses. Gans shows that the errors experienced are amplified by what statisticians call the “testing paradox.” Users encounter errors where they need help the most.
Although Gans’ 2026 paper does not directly cite or refute Sutton, it can be read as an exploration of the structural limits that persist even when following the path of bitter lessons. Scaling works, but the economic benefits of scaling can be partially offset by persistent unpredictability that scaling does not solve.
This limitation has practical implications for how companies deploy AI. You can’t simply trust benchmark performance; you need to invest in human monitoring and domain-specific testing. This also means that AI will not eliminate human jobs.
Sutton was on the right track, but his insights should not be taken out of context. Scaling alone is not enough, and simply adding more scaling is unlikely to achieve superintelligence. Models still require human insight and structure to be most useful to businesses. RLHF (Reinforcement Learning from Human Feedback) is a training technique that allows human evaluators to evaluate the AI’s output so that the model learns which responses are beneficial and safe, an element that infuses human values into the model. Previous architectures did not become GPT-4 simply by adding data.
Also, you can’t “scale up further” forever. Energy costs and data limitations are real-world constraints. So, for AI to be better, it will require efficiency and algorithmic smarts, not just brute force. Human insight has not yet become irrelevant. We have moved from directly encoding intelligence to forming, constraining, and manipulating scaled learning systems.
All in all, let’s give Sutton his due credit. Scaling works. But the efficiency of that scaling depends on human insight into how these systems are built and deployed. Economists will recognize this as a familiar pattern. Even if capital is measured in GPUs and labor involves designing a loss function, capital and labor are still complementary.
Gans’ study adds an important economic footnote. Even if scaling improves the average performance of AI, the jagged and unpredictable nature of that performance means it incurs real costs for adopters. Businesses and individuals will have to deal with situations in which AI’s capabilities increase while at the same time becoming persistently unreliable in ways that are difficult to predict. The economic return on AI investments will depend not only on raw capabilities but also on complementary human expertise to manage developing institutions and jagged edges.
The bitter lesson may be that pure scaling is powerful, but the sweet corollary is that human ingenuity remains an important factor for future progress.
[1] Compute in AI research refers to the total amount of computational power (usually measured in floating point operations (FLOPs)) used to train or run a model.
