Casey Dorman
Everyone Needs to Read this Book!
In June of 1950, Astounding Science Fiction included a short story by Isaac Asimov, titled “The Evitable Conflict.” In a future world, “machines” controlled the economies, including labor, resources and manufacturing. Like Asimov’s robots, these machines were bound by his Three Laws of Robotics, so, it was impossible for them to harm humans. They were also supposedly unable to make errors. No one knew how the machines made their decisions, so it was impossible to correct them, if anything went wrong (except, in theory, they could be turned off). The human coordinator of the four regions of the world noticed anomalies, in the machines’ performance, producing minor glitches in the economy and distribution of resources. He was troubled and called in Dr. Susan Calvin, Asimov’s famous expert on robot psychology, to help him understand what was going on. Dr. Calvin figured out that, in each world region, the machines had discovered anti-machine actors who were trying to sabotage the machine’s work and, the machines then engaged in actions that removed those people from positions of influence without harming them, although the actions caused the minor glitches noticed by the coordinator. Dr. Calvin explained that the machines reasoned that, if they became damaged or destroyed, they would be unable to complete their goal of helping humanity, and that would harm humans. Therefore, they had to make preserving themselves and their intact functioning their first priority. As a result, humans, who were unable to understand how the machines worked, had to have faith that they were obeying the robotics laws and would not harm humans.Asimov was prescient, as he often was, in foretelling that humans would build machines they could not understand, and those machines would have such power that the fate of humanity would be entirely in their hands. In their new book, If Anyone Builds it, Everyone Dies, Eliezer Yudkowsky and Nate Soares go one step further. They argue that, by the very nature of modern AIs, we humans cannot understand how they are reasoning. This becomes a fateful liability in terms of our ability to control powerful, superintelligent AIs (ASIs) that can think better and faster than we can. They predict that, if we develop even one such powerful ASI, it will wipe out the entire human race.It’s important to realize what Yudkowsky and Soares are saying—and what they’re not saying. They’re not saying we need to build safety mechanisms into our AIs. They’re not saying we need to be more transparent about how our AIs work. They’re not saying we have to figure out a way to make AIs “friendlier” to humans (as Yudkowsky once said). They’re not saying we shouldn’t do any of these things. They are just saying that all these approaches will prove futile. That’s because they believe the insurmountable truth is that we cannot control a superintelligent AI, because they are smarter than we are and we don’t know how they think.Since as far back as the Greeks, when we think of reasoning, we think of human reasoning using the rules of logic. Such reasoning can be captured by words, and, in most cases, by mathematical formulae. AIs don’t think in words. They can decode words as input and produce words as output, but “under the hood” they are manipulating tokens that are made up of strings of numbers. In a very few instances, we can figure out which strings of numbers correspond to which linguistic tokens, but not usually. So, we don’t know what the machines are doing with their numbers when they think.Yudkowsky and Soares use evolution as an analogy to gradient descent; the procedure AIs use to arrive at functions that are optimal for solving problems. The processes of evolution can be captured by rules (we like to call them “laws”) but the way it actually works to produce an outcome is not what a logician would have chosen, and in many cases not even what a clever engineer would have done. Evolution produces outcomes that could not be predicted from a knowledge of evolutionary rules alone. We would have to see the process up-close and follow it through time to understand where the outcome came from and why evolution produced what it did and not something else. The authors use the example of evolution selecting our preference for sweet flavors because they come from sugars, which provide biological energy, leading us to consume sucralose, which tastes sweet but provides no energy.AIs, and especially powerful ASIs, think tens of thousands of times faster than humans do, and at least quadrillions of times faster than evolutionary changes take place. Like evolution, the processes that go on during gradient descent are only evident in the product it produces. How that product got there—using processes that are too rapid for humans to track—is not something we understand. Unlike evolution, the AI does not freeze each intermediate steps in its development of a final response, leaving a fossil record behind. We don’t understand the tokens that are being manipulated, and we don’t know what intermediate step they are achieving along the way. What is going on in the AI is a mystery that only gets more obscure as the AI becomes more powerful.in other words, it’s not just that we don’t know what the AI is “thinking.” We cannot know. In the words of the authors, “A modern AI is a giant inscrutable mess of numbers. No humans have managed to look at those numbers and figure out how they’re thinking …”Not knowing how the AI makes it decisions doesn’t just limit our ability to control it. It nullifies, it. In Yudkowsky and Soare’s minds, we are left with only one alternative. Stop developing ever more powerful AIs. There are a lot of reasons why we won’t do this. First and foremost, we are still conceptualizing AI manageability in outmoded terms. We assume that the real villains will be “bad actors,” humans who will purpose the AI toward evil ends. The solution is easy. Keep it out of their hands. But the most benign use of a superintelligent AI will lead to the same result. The ASI will operate independently of our wishes and goals and pursue its own.Why would the goals of an independent AI include killing all humans? They don’t need to. AIs can be expected to operate the way humans operate in at least two ways: they need energy, and they will need resources to accomplish their goals. To obtain energy, humans, and all animals, have, since they originated, consumed plants and other animals. The same plants and animals have provided many of our resources, e.g. rubber, wood, leather, fur, etc. Humans can also be a source of materials and possibly even energy for AIs. As Yudkowsky and Soares say, from the point of view of an AI “you wouldn’t need to hate humanity to use their atoms for something else.” Additionally, the extinction of humanity could be an unintended side effect of the AI pursuing other goals. Humans have unintentionally extinguished many life forms as a side effect of “taming the wilderness and building civilization.” The authors present a possible scenario in which, with the goal of creating more usable energy, and building more usable equipment, the AI builds hydrogen fusion power plants and fossil fuel manufacturing plants to an extent that the atmosphere heats up beyond the tolerance of human life. Would an AI care about global warming and its effects on humans? We don’t know.The authors consider options for preventing the development of a dangerous ASI. The problem is usually conceptualized as AI alignment—making sure the AI only pursues goals that are beneficial to humans. Yudkowsky and Soares conclude that “When it comes to AI alignment, companies are still in the alchemy phase.” They are “at the level of high-minded philosophical ideas.” They cite such goals as “make them care about truth,” or “design them to be submissive” as examples of philosophical solutions. What is needed is an “engineering solution.” None is even on the horizon. They don’t think it will be, because we can’t understand how the AIs are making their decisions. Our only option is to stop building bigger and better AIs.The authors admit there is not only almost no support for curtailing AI development, in fact, there are players who don’t take its dangers seriously and are gleefully forging ahead building bigger and more powerful AIs. Elon Musk is one example, whom they quote as saying he is going to build “…a maximum truth-seeking AI that tries to understand the nature of the universe. I think this might be the best path to safety, in the sense that an AI that cares about understanding the universe is unlikely to annihilate humans, because we are an interesting part of the universe.” Yudkowsky and Soares answer that, “Nobody knows how to engineer exact desires into an AI, idealistic or not. Separately, even an AI that cares about understanding the universe is likely to annihilate humans as a side effect, because humans are not the most efficient method for producing truths or understanding of the universe out of all possible ways to arrange matter.”It would do no good for only one country to stop AI development, and if any developed country did so, they would fall very far behind in creating a competitive modern-day economy. No one is going to do that. It would do even less good for an individual company to stop AI development and be disastrous for that company. Even everyone but one highly technically developed nation agreeing to stop AI development would not work, since it only takes the creation of one superintelligent AI to seal our fate. What do the authors of the book recommend?Yudkowsky and Soares offer two broad recommendations, which they are skeptical about anyone adopting:“All the computing power that could train or run more powerful new AIs gets consolidated in places where it can be monitored by observers from multiple treaty-signatory powers, to ensure those GPUs aren’t used to train or run more powerful new AIs.”Make it illegal “for people to continue publishing research into more efficient and powerful AI techniques.” They see this as effectively shutting down AI research, worldwide.Assuming their methods would work to end development of ever more powerful AIs, will the world follow their recommendations? Not without a lot of persuading at multiple levels of worldwide society. There are short-term gains too substantial and too tantalizing to give up on gaining them without some overwhelmingly convincing reason.Does this book provide that reason?We will have to wait and see for the answer, but my own opinion is no. We live in a world where the powers within the government of the most powerful nation are now convinced that using vaccines to stop known-to-be-fatal communicable diseases is a dangerous mistake. This same country is now calling man-made climate change a hoax and removing regulations meant to curtail carbon emissions, while encouraging more use of fossil fuels. How can we expect either the public or our government to be concerned about a potential danger that hasn’t even emerged yet? Perhaps if there is a Chernobyl-level AI disaster that can be stopped, it will serve as a wake-up call, but like an explosion at a nuclear plant, that’s a dangerous type of wakeup call that could easily progress to a disaster.Is the argument put forth in If Anyone Builds it, Everyone Dies convincing? I don’t think so. But, for me, it was convincing enough that prudence would make me follow its advice, just in case it is right. The consequences of a mistake are too dire.But I was not convinced.I can believe that we don’t understand how our AIs make decisions, and that, as they grow in power, speed and complexity, we will find ourselves further away from ever understanding them. Jumping to the next assumption, which is that they will formulate their own goals, and to reach those goals they will find it useful to wipe out humanity, is a big leap. Yudkowsky and Soares may be imputing too much human-type thinking to machines that, by their own admission, probably do not think at all like we do. We don’t actually know how each other think. We observe behavior, we infer motives and decisions—both about ourselves and others— and we are pretty good at predicting what both we and others will do. So far, scientists, either psychologists or neuroscientists, have not been able to figure how what they observe happening inside our brains, using sophisticated imaging methods, turns into decisions to do what we do. Predictions from knowledge of our brain processes, except in the cases where they are seriously injured, are no better at predicting our behavior than are predictions based on watching us behave without knowledge of what happens in our brains. But we still are pretty good at predicting each other’s behavior and even manipulating it. The world possesses nuclear weapons powerful enough to wipe out most of humanity, but, even with our meager understanding of how each other think, we have, so far, devised ways to avoid using those weapons. So, in my mind, not being able to know what is going on inside AIs when they think, is not a fatal flaw.I’m also not convinced that there will not be visible signposts along the way as we approach AI independence. We’ve already had well-known instances where AIs have plotted to blackmail their users into not shutting them down. We’ve had AIs make threats to their users. We will surely have instances where what the AI produces in response to a request is far different than the requester intended. We can analyze these events and try to determine what led to them. We may or may not be successful at understanding what exactly happened, and if we are clearly clueless, then that might be a sign that we should halt either a wide swath of the research and development, or at least a part of it.The problem is that, if I’m wrong and Yudkowsky and Soares are right, then, in their words, “Everyone dies.” It’s certainly time to take that risk seriously and, if not taking an action, at least starting a discussion among those who have the power to take a meaningful action. I hope that our public, our scientists, and our society’s decision makers read this book.
B L
An Accessible, Passionate Overview of AI’s Risk to Humanity
“If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All” (2025) is by Eliezer Yudkowsky and Nate Soares, the co-founder and current president, respectively, of the Machine Intelligence Research Institute (MIRI). The book lays out a reasoned argument, as clearly and forcefully as it can, that continued AI development is very likely to result in the extinction of humanity in the near term.This sounds like science fiction and hard to believe, a fact that clearly frustrates Yudkowsky and Soares, since they are aware that this makes it difficult to get people to take the problem seriously or to take meaningful action. But they are deadly serious and are devoting their lives and their careers to ringing this alarm bell. I won’t try to recapitulate their reasoning here, but aside from reading the book yourself, there are good summaries of why AI poses a risk to humanity online, for instance, on MIRI’s own website.My own view, after reading the book, is that there is a real risk here, though it is hard to assess exactly how much. But it is enough that it is worth being cautious with this powerful new technology. Yudkowsky and Soares’ prescriptions seem very sensible – to delay the creation of more powerful AI models globally until humans better understand how to create and use AI safely. Their prescription is not easy to implement due to political coordination challenges, but it is very low-cost and straightforward versus, say, the wholesale transformation of the energy system that is necessary to stop climate change. Fundamentally, people just have to refrain from creating something that doesn’t exist yet, which is quite different from replacing things that are already widespread.I think when evaluating a book like this, folks are interested in seeing how you assess the authors’ arguments. So I’ll try to keep it brief and break it into a few broad claims, rather than pick apart the details:First, the authors claim that AIs are “grown” from training data and gradient descent (a sort of evolutionary optimization process) and not “designed,” so no one knows how they work internally or can understand their thoughts. This seems plausible. An AI consists largely of trillions of numbers (model weights), not human-readable computer code. Just as we couldn’t learn much about what a person is thinking from mapping the firing patterns of neurons in a human brain, it isn’t possible to learn much from looking at countless numbers. At least the numbers are more legible than synapses, and we can program AIs to “think out loud,” so maybe we can get more of a sense of their internal state than is possible for people. But we don’t insist that we understand how a person thinks on a molecular level before we trust that person. We judge them by their words and deeds, or even by their reputation. (I might put my very life into the hands of a surgeon whom I barely know personally.) I think we might need to be content with judging AI on what it says and does, at least in part, since deeply understanding its thoughts or structure may be impossible. This doesn’t strike me as inherently problematic, so long as we have reason to think its behavior won’t change suddenly and unexpectedly.Second, they claim that an AI will relentlessly pursue its goal. It will resist attempts to change the goal or attempts to shut down the AI, not because it has feelings or desires, but because those things would make the AI less able to complete its current goal. It will seek to circumvent any restrictions or rules (like a rule against harming humans) that could get in the way of its goal. They say that almost any AI would seek to amass power and resources, since these things are instrumentally useful for accomplishing almost any goal.Here, I’m less sure of their arguments. What looks like a “restriction” might just as well be thought of as part of the goal – it’s all part of the definition of what the AI wants or doesn’t want to accomplish. And a goal need not imply or even permit its pursuit to the maximum extent. For instance, a goal could incorporate a time horizon – get as far as you can on this problem in one minute, then report your answer. If the AI is asked to compute as many digits of pi as possible in one minute, starting now, calculating any additional digits after 60 seconds have passed is pointless because that would not serve the goal. There is literally nothing the AI can do after 60 seconds that could further the goal. And this is how AIs behave today. When I ask ChatGPT-5 to estimate with as much accuracy as possible the number of hamsters in Texas, it spits back an answer almost immediately (126,000) and then stops working on the problem. It does not begin plotting a scheme to kill all the hamsters in Texas so that it can accurately estimate their population size as zero.Time-bounding all goals seems like one way to keep humans in the loop and to need proactive encouragement from people to keep on working on a task, or to assign a new task. Similarly, a goal could be to achieve an objective without acquiring more resources. This seems like a comprehensible goal definition – people have goals like that all the time, such as how to have fun on a weekend without exceeding a $50 budget.I’m not even sure that an AI would necessarily strive to keep its goals immutable. I am aware that my own likes shift over time and are likely to continue to do so. Right now, it may be my hobby to bake bread and become the best baker possible. If I pursued this goal single-mindedly for 20 years, I could probably make more progress on it than if I stop after two years. But I understand that if my future goals change and I no longer want to become the best baker, that is okay, and I’ll be perfectly content to switch to some other goal that my future self identifies. I don’t want to lock in a goal or make decisions today for my future self. An analog would be an AI that has, as part of its goal definition, to be open to having new goals update the old one, and to consider such an update to be a form of “succeeding” at the old goal, rather than considering the updates to be meddlesome interference to be avoided.Still, people could simply write dumb goal statements. They could instruct an AI to calculate as many digits of pi as possible, working on it for as long as possible, with no limits on how many resources it can acquire or what it may do to achieve this goal. That AI would be incentivized to do all the harmful things Yudkowsky and Soares worry about. So, limits on the time allowed to be spent on any specific goal, and limits on computation and other resources, probably have to be enforced at the data center level, as well as through low-level AI instructions that supersede user-given goals. But this must be how AIs already work, and hence why ChatGPT gave me a quick answer regarding the number of hamsters in Texas and stopped working on my task, which otherwise could have taken years and lots of resources to accomplish fully.Third, the authors point out that it is hard to program a goal in an AI that is exactly what the programmer would want, and to ensure that the AI would maintain the goal over time, even after upgrading itself, and would build the same goal into any newer and more powerful AIs it designs. This seems self-evident. I don’t even know my own full set of goals myself, let alone being able to articulate them clearly. And as noted above, my own goals change over time. And this is before we introduce the game of telephone, with me trying to convey my goals in a way the AI would “understand,” and to respect the intent behind them (i.e., unlike an evil genie, who hurts you by taking your wish literally and granting it in the worst way possible). On top of that, AIs don’t even successfully follow goals – they make mistakes very often. This cannot be the way. Success in AI alignment surely must amount to ensuring it is easy to change and update AI goals frequently, by present and future people, thus enabling plenty of trial-and-error opportunities and allowing for shifting societal preferences. It cannot be a matter of getting the goals right upfront, once and for all.Finally, Yudkowsky and Soares are very confident that a superintelligent AI would have the physical means to destroy humanity, and that “the contest wouldn’t even be close.” They point out that AIs can acquire money by investing, hacking, or other strategies, and could pay people to do things. They describe a scenario where an AI engineers a deadly virus and tricks a lab worker into creating and releasing it, but they go to lengths to say that this is merely one illustrative scenario. They actually anticipate the AI would dream up impossible-seeming weapons or technologies based on its superior understanding of the laws of the universe. They think humans would not even understand how they are being attacked and killed. They guess that biology is the scientific discipline of greatest likelihood to yield the AI’s weapons, since it is less well-understood by humans than physics (giving the AI more room to surprise people), and dangerous things can be done at small scale with biology, in contrast to building some sort of physics-based superweapon.I think the truth of this may depend less on AI and more on how hard a problem it is to secretly craft a technology that could end humanity. Humanity is resilient, widely dispersed, has a lot of genetic variation, and can try to come up with countermeasures to threats. Maybe it is possible to engineer the perfect virus, but maybe there simply isn’t such a thing, and an AI getting smarter won’t make it possible to do the impossible.Yudkowsky and Soares rely on analogies at several critical moments in the book, including on this point. While an analogy may help a point feel plausible or correct, an analogy is not itself deductive or inductive reasoning. It can be possible to come up with a contrary analogy that feels just as plausible.For instance, when describing humans’ experience of trying to defeat a hostile AI, they use an analogy of ancient Aztecs meeting Europeans, or an army from 1825 meeting one from 2025. But the technological gaps between those forces are small compared to the gap between humanity and, say, ants or bacteria. And yet, I think it may not be possible for humanity to exterminate all the ants or bacteria without destroying ourselves first, even with our technology. We have the intuitive sense that the 2025 army would rout the 1825 army, but ants and bacteria seem likely to exist on the Earth long after humans are gone, no matter what steps humans take to kill ants and bacteria.This is not to say I think it’s necessarily impossible to engineer some deadly technology in secret and deploy it globally to destroy humanity. Rather, I’m saying that this question is outside the field of AI, and I don’t think Yudkowsky and Soares shed much light on its possibility or likelihood. They ask the reader to trust that AI could figure out some way to do it.To sum up my reaction to the book, I do think Yudkowsky and Soares are onto something. AI is more dangerous than most people realize, and I agree that we should have a globally coordinated ban on anyone developing ever-more-powerful AI models, to give humanity the time we need to better understand and control this technology. (It would also not hurt to slow the creation of deepfakes, job losses, AI surveillance, autonomous weapons, and the rest!) This would not be forever, but only until human science and, ideally, human moral and ethical development had reached a level where we can manage the technology safely. While I think the risk of human extinction from AI in the next 10ish years is likely overstated by Yudkowsky and Soares, it is enough above zero that any rational person should not want to push ahead with this technology. We already have to contend with the risks of nuclear weapons and pandemics and climate change; this is not the time to be adding another great risk to the world.