Oxford’s Dr. Anders Sandberg Combats A.I.’s Threat to Humanity with His Whiteboard

On a rainy winter’s day in Oxford, many of the world’s brightest students and professors are studying all aspects of the past—from philosophy to history to classical literature—in the 14th-century Oriel College, in the 16th-century Christ Church College, and across the street in the blue-domed, 18th-century Radcliffe Camera library. Just a few feet from these pre-modern vestiges lies a modest, brick building behind an iron gate. Students and staff alike stroll past without even a glance. Step through, however, and one finds a group of world-class futurists, holed up trying to ensure that humans aren’t soon wiped from the face of the earth.

Clad in a blue sweater and rectangular glasses, Dr. Anders Sandberg looks at home in his office inside the Future of Humanity Institute (F.H.I.). His office is full of discarded chairs, with a half-scrubbed whiteboard at one end. He is leaning back in his chair calmly explaining the many scenarios in which the world might end. At the top of the list is a nuclear holocaust. This possibility, he says, has only become starker with the global rise of populism—Trump, Brexit, the specter of Le Pen. Trust is being eroded in society. Nations are forgoing multilateral treaties and pacts. International tensions are rising. Circumstances have become particularly conducive for an all-out nuclear war.

“Many people think that you can’t think rigorously about the future,” he says. “‘Oh, that’s all going to be crystal balls and wild speculation.’ But it’s not. That’s what’s disturbing.” There are other existential threats that deserve serious attention, Sandberg says—particularly the possibility of a synthetically engineered pathogen or an uncontrollable artificial intelligence. These threats could, as soon as the next few decades, wipe out humanity entirely. While conversations about these threats happen behind the closed doors of specialty institutes like this at Oxford and at the Center for Human-Compatible Artificial Intelligence at the University of California, Berkeley, they tend not to deeply trouble either the wider public or academic institutions at large. “There are more papers about dung beetle reproduction than human extinction,” the 44-year-old Swede says. “I like dung beetles—I collect beetles—but compared to our own species…maybe we should focus a little bit on ensuring that we survive and thrive.”

Sandberg’s overarching goal might seem next to impossible, but its central idea is straightforward: to use philosophy, math, and ethical inquiry to predict the existential risks humans might face. If humanity can understand how someone could, for instance, engineer and disseminate a deadly virus or how artificial intelligence might overtake humans, then there might be a chance to put measures in place now so these disasters never happen at all.

It sounds like science fiction, but thinking about these kinds of doomsday scenarios as purely the realm of pulpy novels and films is a dangerous game. Very soon, they might become reality. And if we’re doing nothing to counter them now, how could we possibly react if they really do come to pass?

Born and raised in a suburb of Stockholm, Sandberg is an expert on a remarkably wide range of topics, from cognitive enhancement and biases, to collective intelligence technologies, to public policy, to the ethics of neuroscience. He holds a doctorate in computational neuroscience from Stockholm University and is currently writing papers on topics ranging from the probability of alien intelligence to the history of technological achievement, how to create simulations of global catastrophes, and how to regulate forthcoming artificial intelligences. He jokes that he’s “a walking Wikipedia”—with a little bit of knowledge on just about everything.

At F.H.I., Sandberg is a senior researcher who works with a team of philosophers, mathematicians, logicians, ethicists, and technologists, all of whom have received funding and support from the likes of Bill Gates, Elon Musk, and Stephen Hawking. One of the biggest problems that the institute is currently working on is the existential risk posed by artificial intelligence. Full-fledged, self-aware artificial intelligence, Sandberg says, is going to be achieved soon. Perhaps it will be in 30 years, perhaps it will be in 200 or 300 years—but in any case, it offers too many potential benefits not to soon become a part of our world. “A.I. is super promising,” he says. “After all, if we could automate many things, we could get rid of a lot of thankless labor, and we could, perhaps, solve many of the world’s most important problems. The trick is developing safe technologies before we get to the dangerous technologies.”

I know this makes us sound terribly gloomy, but the reason we’re thinking so much about the future is that, deep down, we’re amazing optimists.

Artificial intelligence is already in wide use. Just think of the benefits it offers in the world of health care alone. DeepMind, Google’s A.I. branch, is currently collaborating with the United Kingdom’s National Health Service on software that is being taught to diagnose everything from eye diseases to cancer. Along with machine learning, it is being used to catch early signs of conditions such as heart disease and Alzheimer’s. A.I. is also being combined with so-called “big data” techniques to analyze huge amounts of information about molecules so that new drugs can be conceived in a time frame that is decades, perhaps even centuries, faster than humans could do on their own.

But as potentially useful as A.I. might be, it poses a variety of dangers—from shutting off its ability to be modified, to becoming intelligent but not moral, to negatively and permanently altering humans (so that, perhaps, we will approve of otherwise unsavory A.I. actions). The biggest threat posed by A.I., however, derives from the fact that a self-aware A.I. might have an inherent motivation to destroy humans in pursuit of a larger goal. “Many people think that what we’re worried about is some kind of scenario where the robots are rebelling, or they want to destroy us,” he says. “The real problem will probably be some form of malicious indifference. They won’t care. But we’ll have resources that would be much more useful for a big project or their A.I. system, so we’ll get pushed to the side and used as resources. They may not be designed to cherish what we cherish.”

Andrew Ng, the vice president and chief scientist of Baidu Research, a Chinese internet services company similar to Google, and an adjunct computer science professor at Stanford, disagrees with Sandberg. He says that fearing a “killer A.I.” is like “worrying about overpopulation on Mars.” His inherent biases aside (Baidu has invested heavily in A.I.), Ng’s argument is largely a facile one: Little would have to be done now to counter the future possibility of the overpopulation of Mars; whereas, during these current developmental stages of A.I., there is a particular importance in laying down safe, foundational programming to protect from an otherwise risky future. His claim is, essentially, that it’s too early to begin thinking about the existential risk of artificial intelligence.

It’s not just Ng who thinks the threat of A.I. is overblown. Grady Booch, a well-known computer engineer, a popular TED Talk lecturer, and a cocreator of the Unified Modeling Language, believes that A.I. poses little threat. His argument isn’t that it’s too early to think about; it’s that A.I.—if programmed properly—will never pose an existential threat at all. “In producing these machines, we are therefore teaching them a sense of our values,” Booch said in a recent lecture. “To that end, I trust an artificial intelligence the same, if not more than, a human who is well-trained.”

The trouble with placing a deep trust in A.I. is that engineers and developers may have financial incentives that point them toward putting safety on the backburner in favor of functionality. If this is the case then policy makers cannot be sitting on their hands now. Engineers need to be convinced that a full-fledged A.I. poses a legitimate risk of destroying humanity. This way, safety might remain at the forefront of their minds.

Photo Assistant John Cronin. Film Lab Bayeux Ltd. Print Lab Hammer Lab.

“It is so easy for them to think, ‘Oh, those Luddites over there in the crazy part of the philosophy department don’t know anything. They’re just trying to scare away funding from our field,’” Sandberg says. “But we’re not crazy, and we’re not Luddites. We actually really would like to see super intelligent A.I.”

But how does one prove the future risks of A.I.? In a larger sense, how does one convince someone that a certain vision of the future is the correct one?

This is where the office whiteboard comes into play. Because Sandberg and his team are not working with large sets of known data—since everything is an unproven, future potentiality—they rarely use complex computer algorithms, deep learning, big data, or any of the other problem-solving tactics that have come increasingly into vogue in recent years. Rather, much of their work is done in basic computer simulations and old-fashioned brainstorming and philosophizing—with pencils and dry-erase markers.

“Our approach is very much to split the problem apart into smaller problems, so we can understand each part,” Sandberg says. “Then you can start analyzing and building scenarios, and try to say, ‘Okay, what answers do I need to get before I can actually really understand, or know, the reasons behind this particular risk?’ It’s relatively easy to construct a story of X leads to Y that leads to Z that leads to human extinction. But maybe there is something else—a W—that could also lead to this.”

In this case, the W value is the error that first leads to the X value on the road to extinction. This might be a simple programming error. More likely, however, the error would have more to do with human programmers putting a premium on an A.I.’s efficiency or capabilities that, in the short-term, might bolster its ability to complete legitimate tasks, but, in the long-term, would lead it to being able to limitlessly teach itself or to overtake human control.

Another possible way to stop the beginning of A.I.-induced human extinction might be to give engineers incentives to think less about efficiency and more about safety. Stuart Armstrong, F.H.I.’s fellow in artificial intelligence and machine learning, has been in talks with Google about making software that is willing to allow itself to be turned off. Likewise, the Center for Human-Compatible Artificial Intelligence at U.C. Berkeley focuses on teaching programmers the negative possibilities of A.I. and how to program against them. With $5.5 million in funding from the Open Philanthropy Project, and led by computer-science professor Stuart Russell, the center will provide both classes and tax incentives to ethics-minded A.I. developers. Even politicians are getting involved. F.H.I. has partnered with the Finnish Ministry of Foreign Affairs, giving talks on reducing global risks and coaching diplomats on how to convince funding committees—and even other nations—of the importance of addressing the risks of the future. “Existential risk is something that threatens all of us,” Sandberg says. “If we don’t go and think about it now, we are not going to have any descendants.”

The emphasis on thinking about the future now is the central difference between the F.H.I. and the major companies investing in and creating the technologies that might lead to the end of human civilization. It’s a common trope. The Silicon Valley programmer has an incentive not to think about the potentially negative future of A.I. just as the mining executive has an incentive not to think about the negative future caused by climate change. Both have serious and harmful repercussions, but so long as capitalism infuses their decision-making, the decisions being made will be mostly short-term and financially oriented.

So long as we don’t bring on our own demise, we have about 5 billion years until the sun burns out. That’s a lot of time to propel our current values into the future, to live, almost forever, through our legacy.

On the face of it, it’s hard to argue with short-term thinking. The generations currently inhabiting the planet likely won’t die from climate change, an unstoppable virus, or an out-of-control A.I. But it’s a deeply skeptical mode of thinking. It’s only those who think about the future death of our species who have found a way to care for our legacy as humans. “I know this makes us sound terribly gloomy, but the reason we’re thinking so much about the future is that, deep down, we’re amazing optimists,” Sandberg says. “That optimism is a reason to work hard on ensuring that future comes to pass. If we had been pessimists, we would have said, ‘Yeah, we’re doomed. Anybody up for a beer?’ There is no point in trying to save yourself if you really think you’re doomed.”

So long as we don’t bring on our own demise, we have about 5 billion years until the sun burns out. That’s a lot of time to propel our current values into the future, to live, almost forever, through our legacy. Whether our values will be able to hurtle along through 5 billion years of space and time comes down to whether we’re willing to take the precautions about the future now.

To think more wholly about the future might mean to sacrifice some potential profit or even quality of life. In the case of artificial intelligence, the first order of business is convincing the brilliant technologists, programmers, and engineers that although A.I. might be able to one day control your car, your airplane, your town’s power grid, your healthcare, even your financial decisions, it is the safety measures that are enacted now that might be all that separates humanity from imminent destruction or a flourishing future.

A.I. is currently being considered for relatively short-term solutions, but if the long-term future of humanity is to be the connected, convenient, kind future that so many of us envision and desire, Sandberg and the F.H.I. suggest that we turn our gaze to the long-term. In doing so, we’ll be more inclined to make decisions that will actually allow this type of future to become a reality. “The future could actually be so bright,” Sandberg says. “It’s almost impossible to conceive of it.”