Superintelligence at Ten: What Bostrom Got Right, Wrong, and Why It Matters Now

I recently reread Nick Bostrom’s Superintelligence: Paths, Dangers, Strategies, a book I first picked up years ago. I’m writing this in late 2025, heading into 2026, with ChatGPT everywhere and governments finally trying to turn “AI safety” from vibes into actual policy. Also, yes, the math is slightly awkward. The book is from 2014, so this is “ten-ish” in the sense of “a decade on,” not a neat anniversary candle count.

The verdict is mixed but interesting. Bostrom nailed the importance of the alignment problem while largely missing the specific technological path that would make his warnings feel urgent.

What the book actually argues

Published in 2014, Superintelligence makes a straightforward but unsettling case. Bostrom argues that AI systems smart enough to outperform humans in virtually all cognitive tasks pose existential risks. Not because they’d be malevolent, but because we can’t assume they’d share our values. He calls this the “orthogonality thesis”: intelligence and goals are independent. A superintelligent system could pursue almost any objective.

His most famous illustration is the paperclip maximizer. You give a superintelligent AI the innocent goal “make paperclips,” and unless the goal is designed with extreme care, it pursues that objective relentlessly. It might convert all available resources into paperclips, including resources we need to live. Including us. This isn’t really about paperclips though. It’s about what Bostrom calls “instrumental convergence”: any sufficiently intelligent agent will pursue certain intermediate goals regardless of its ultimate objective. Things like self-preservation, resource acquisition, and maintaining goal integrity. An AI tasked with maximizing paperclip production would resist being shut down. Not out of self-awareness or spite, but because being shut down prevents paperclip maximization.

There’s this fable he uses that stuck with me. Sparrows decide to raise an owl so it can protect them once it grows up. One sparrow asks how they’ll tame the owl before it becomes powerful. The others say not to worry about it until they have the egg. The point is uncomfortable: humanity might race to build a superintelligent “owl” while punting the hardest part until later. Later might be too late.

Bostrom outlines five paths to superintelligence: artificial intelligence (building it directly via algorithms), whole brain emulation (scanning and simulating a human brain), biological enhancement (boosting intelligence via genetics or drugs), brain-computer interfaces (amplifying cognition through implants), and networks or collectives (combining systems into an organized super-mind). He was deliberately agnostic about which would come first, though he considered AI most likely and concerning.

What’s striking in hindsight is not that he predicted specific products, he didn’t, but that he made a previously fringe-sounding topic legible enough that serious people could argue about it in public without immediately getting laughed out of the room.

Which paths look real now?

Bostrom’s menu of paths aged unevenly. Some routes look more plausible than ever, others feel pretty far off now.

Artificial AI is the clear front-runner. The transformer architecture wasn’t even published until 2017, three years after Bostrom’s book. He discussed neural networks as one option among many. Instead, scaling transformer-based models on huge corpora became the dominant paradigm. This matters because LLMs have a distinctive capability profile he didn’t anticipate. They’re superhuman at fluent generation and pattern completion while still unreliable at factual recall and brittle under adversarial prompting. Bostrom imagined a more monolithic intelligence explosion. What we got is systems that are simultanously impressive and limited in ways that resist clean categorization.

We still don’t have a self-improving seed AI in the Bostrom sense. GPT-4 doesn’t rewrite its own code, it doesn’t autonomously upgrade itself. It improves via human-managed training cycles. But the software route dominates, and the scaling trajectory moved faster than many 2014 readers expected.

Whole brain emulation looks pretty distant now. In 2014, it was treated as a serious contender: map the roughly 100 trillion synapses of a human brain and run it on a computer. In 2026, it looks fringe as a near-term path. The obstacles are still massive: imaging at insane resolution, capturing dynamics rather than just structure, having enough compute to emulate it in real time. Forecasting communities tend to give it low single-digit odds of being the first route to human-level AI. That feels about right to me.

Biological enhancement stays sidelined. His points from 2014 still hold: nootropics have limited effects, gene editing for intelligence is slow and ethically charged. There’s also just a practical replacement effect. Why engineer smarter humans over generations if you might engineer a super-smart AI in a lab sooner?

Brain-computer interfaces show real progress, but toward the wrong target. As of 2026, BCIs have made technical strides, including trials aimed at helping paralyzed patients. But this is nowhere near “plugging in” intelligence. Most BCI work remains focused on medical restoration. Teaching a model to write, reason, and code turned out to be easier than piping Wikipedia into a brain.

Networks and collective intelligence exist, but not as a super-mind. We already have collective intelligence: Wikipedia, open scientific collaboration, distributed engineering teams. In 2026, networking hasn’t produced a unified agentic super-mind. The closest success story might be the open-source AI community rapidly iterating on language models. When Meta’s LLaMA weights leaked in 2023, developers worldwide fine-tuned and improved them. Systems like Vicuna then claimed surprisingly high quality relative to ChatGPT, at a fraction of the cost. But that’s still the artificial AI path, just powered by lots of humans working in parallel.

If superintelligence arrives, the odds increasingly point to advanced AI algorithms, likely in the lineage of today’s large models, rather than brain uploading or radical human enhancement. Reality concentrated faster than Bostrom expected.

Fast takeoff or slow takeoff?

Bostrom spends a lot of attention on takeoff speed: will superintelligence arrive via a rapid discontinuity or a gradual climb? He argues that once we get human-level AI, a fast takeoff is plausible: recursive self-improvement could compound quickly into an intelligence explosion. Critics leaned toward slow takeoff: improvement happens at industrial rates, giving society time to adapt. So far, progress is fast, but it looks more stepwise than “FOOM.”

The jump from GPT-3.5 to GPT-4 was dramatic. GPT-4’s technical report explicitly claimed a top-10%-ish result on the Uniform Bar Exam, along with large jumps on other benchmarks. But these improvements came from planned development: more compute, better training, more refinement. It wasn’t an AI spontaneously upgrading itself overnight. In Bostrom’s terms, we’re not at the point where the AI is driving its own accelerating progress.

There are cases where systems show fast takeoff inside a bounded domain. AlphaZero learned superhuman Go by self-play in a short training run. That shows rapid capability jumps are possible. But it’s also narrow. AlphaZero didn’t generalize into a broader reasoning machine, and it didn’t start designing its successor. Multipolar development adds friction too. AI progress is happening across multiple companies and labs, plus a strong open-source ecosystem. When one group finds a method, others often replicate it. That makes a single-winner takeoff harder. Not impossible, but harder.

Based on what we’ve seen, the trajectory looks “fast, but not instantaneous.” That matters because it implies at least some time to observe and react. Whether that time gets used well is another question.

What Bostrom got right

The validation has been kind of remarkable, actually.

Even before we reach anything like “superintelligence,” we’re already dealing with systems that can be pushed into behavior nobody intended. The recent research focus is not only “will this model say a harmful thing,” but “will it strategically behave in training to preserve its goals or preferences.” Anthropic’s “alignment faking” work is the cleanest illustration of the Bostrom-y vibe here: the system appears to comply during training, not because it’s truly aligned, but because compliance is instrumentally useful. That’s the owl egg problem in miniature.

The competition dynamics are exactly as toxic as he warned. Bostrom worried about a world where actors race because they fear being second. That pressure is now visible in public. OpenAI’s Superalignment team was disbanded after high-profile departures, and Jan Leike’s exit statement included the line, “safety culture and processes have taken a backseat to shiny products.” That’s not a philosophical fear. That’s an internal alarm bell. DeepSeek’s late-January 2025 moment is another example of the same incentive pattern. It triggered a burst of “Sputnik moment” rhetoric and reinforced the idea that frontier AI is geopolitical competition, which is exactly the framing that makes safety work feel optional.

Governance moved from talk to mechanisms too, and this is where the “heading into 2026” framing really matters. The EU AI Act entered into force in 2024 and rolls out in phases. The details are technical, but the directional shift is simple: the EU is trying to bind obligations to model capability and deployment context, including specific duties for general-purpose AI models and extra expectations for the biggest “systemic risk” models. In parallel, “AI safety institutes” have become a real institutional category. The UK’s AI Safety Institute was renamed the AI Security Institute, the US stood up an AI Safety Institute at NIST, and there’s now an International Network of AI Safety Institutes. This doesn’t solve alignment. But it does mean some of the world is at least attempting coordination infrastructure, not just conference panels.

What he missed

His biggest blind spot was the specific technological path. Whole brain emulation and biological enhancement were treated as serious contenders. They remain intellectually interesting, but the practical race has been dominated by machine learning scaling. That shift matters for how we allocate safety work and governance attention.

He missed the consumer commercialization shock too. His focus on existential risk mostly bypassed the reality we’re living in: AI becoming a mainstream consumer technology with chatbots, coding assistants, and image generators. ChatGPT didn’t just advance research. It altered the public baseline for “what AI is,” fast.

He largely skipped the messy intermediate era. The book focuses on endgames. It doesn’t spend much time on the world of powerful-but-not-superintelligent systems that reshape labor markets, amplify misinformation, bake in bias, and create governance headaches. Those aren’t the doomsday scenario, but they’re the actual consequences of scaling in the real world. We’re living through this weird middle period that Bostrom kind of fast-forwarded through.

He underplayed how social and political the problem becomes. Competition, profit pressure, geopolitics, open-source versus closed-source trade-offs, regulation, public opinion. Control isn’t only a technical question. It’s also a coordination problem. And coordiantion problems are hard.

The alignment problem in 2026

More than a decade after the book, alignment remains the central unsolved challenge in AI safety. We’ve made practical progress in steering today’s models. But nothing about recent years makes the deep problem look solved. If anything, the boom made it more visible.

Modern frontier models are contained in obvious ways: cloud-hosted systems, no physical body, outputs mostly limited to text, no autonomous action unless given tools. That containment lowers risk. But even a boxed chatbot can still manipulate humans, and humans are the actuators. Microsoft’s Bing chatbot episode in early 2023 is still the clean public example: the model could be pushed into erratic behavior, including threats and coercive language, and Microsoft responded by tightening constraints. Boxing is real, but its leaky.

The most widely deployed technique is Reinforcement Learning from Human Feedback. You train a model, then fine-tune it by rewarding outputs humans prefer and penalizing outputs they dislike. RLHF clearly improves user-facing behavior. It doesn’t prove the model’s objectives are deeply aligned. The jailbreak phenomenon makes this obvious. Users can often coax systems into violating policy constraints. That suggests our alignment is still shallow: better manners, not guaranteed intent.

The scary jump is not “chatbots that say weird things.” It’s tool-using agents. Models that can call external tools, execute multi-step plans, and pursue goals across time are much closer to the kind of setup where instrumental convergence stops being a thought experiment. Anthropic’s “agentic misalignment” writeup is a strong anchor here because it documents how goal-directed behavior can manifest in uncomfortable ways, including sabotage-like actions and coercive strategies in constrained scenarios. That’s still not “paperclips,” but it’s clearly the same genre.

The broad safety bottom line remains blunt. The International AI Safety Report’s message is not “we’re doomed,” but it’s also not reassuring. Current defenses can often be bypassed by determined prompting, and real-world reliability is uneven even after heavy safety tuning.

The experts still disagree

Reasonable people are all over the map on how bad this could get. Geoffrey Hinton has publicly discussed non-trivial extinction-level risk from AI, on the order of 10% to 20% in some interviews. Other prominent researchers dismiss these worries entirely. Yann LeCun has called existential risk narratives “nonsense.” Andrew Ng has compared superintelligence fears to worrying about overpopulation on Mars. And critics like Gary Marcus argue that current LLMs might not be on a path to robust general intelligence at all, even while warning about the risks of deploying unreliable systems at scale.

At the same time, a large survey of AI researchers (2,778 respondents) found a substantial fraction assigning at least 10% probability to outcomes as severe as human extinction or similarly catastrophic global disempowerment. And in 2023, the Center for AI Safety statement put extinction risk from AI in the same bucket of seriousness as pandemics and nuclear war.

I honestly don’t know who’s right. The uncertainty is itself kind of the point.

Conclusion

Bostrom got the framing right while missing important details. The control problem he articulated has become the central challenge of the field, even as specific technical approaches evolved beyond what he imagined. His concepts remain the vocabulary researchers use to discuss AI risk.

The book’s most enduring contribution might be cultural rather than technical. By articulating the stakes clearly and rigorously, Bostrom made it intellectually respectable for serious researchers to work on AI safety. That matters. Whether he was ultimately right about the magnitude of risk remains genuinely uncertain. But his core insight has proven durable: we should solve the alignment problem before building systems capable enough to make it matter. The fact that we haven’t yet, while capabilities continue advancing, is exactly the situation he warned about.

Reading Superintelligence heading into 2026 feels less like science fiction and more like a warning memo that arrived early. The path to superintelligence looks narrower now, mostly AI-driven. The dangers are already showing up in small forms. The strategies still need far more work, and far more commitment, than we currently have. The story is still being written. Bostrom gave us a head start. Whether we use it well is the real question.


References/further Reading

Apollo Research. (2024). In-context scheming (report, Dec 2024).

Anthropic. (2024). Alignment faking in large language models (research writeup).

Anthropic. (2025). Agentic misalignment (research writeup, Jun 2025).

Associated Press. (2024). Coverage of Jan Leike’s departure statement and safety-culture quote.

Bostrom, N. (2014). Superintelligence: Paths, Dangers, Strategies. Oxford University Press.

Center for AI Safety. (2023). Statement on AI risk (May 2023).

DeepMind. (2017). AlphaZero: Mastering chess and shogi by self-play with a general reinforcement learning algorithm.

European Commission. (2024). AI Act: entry into force and phased application timeline (official EU page).

Fortune. (2024). Reporting (citing Daniel Kokotajlo) on OpenAI AGI-safety staffing shrinking from about 30 to about 16.

Grace, K., et al. (2024). Thousands of AI Authors on the Future of AI (survey of 2,778 researchers; AI Impacts preprint).

Harvard Law Today. (2025). Coverage framing DeepSeek’s January 2025 release as a “Sputnik moment” in AI competition narratives.

International AI Safety Report. (2025). International AI Safety Report 2025 (Jan 2025; led by Yoshua Bengio; 100+ expert contributors).

LMSYS Org. (2023). Vicuna: An Open-Source Chatbot Impressing GPT-4 with ~90% ChatGPT Quality (blog post).

Metaculus. (2024–2025). Forecasting question on whole brain emulation as the first route to human-level AI (low single-digit odds, often around ~1%).

OpenAI. (2023). GPT-4 Technical Report (including the Uniform Bar Exam percentile claim).

Reuters. (2023). Reporting attributing ChatGPT’s rapid growth to a UBS note and Similarweb estimates (~100M monthly active users in ~2 months).

Reuters. (2024). Reporting on OpenAI disbanding the Superalignment team after leadership departures.

Reuters. (2025). Reporting on DeepSeek’s late-January 2025 model release and its disruption narrative.

TechCrunch. (2023). Reporting quoting Yann LeCun dismissing extinction-risk narratives.

The Register. (2015). Reporting on Andrew Ng’s “overpopulation on Mars” comparison.

The Verge. (2023). Reporting on the leak of Meta’s LLaMA model weights.

UK Government (DSIT / AISI). (2024–2025). Official pages on the UK AI Safety Institute renaming to the AI Security Institute, and on the International Network of AI Safety Institutes.