Clarifying the Nathan Collins article on MIRI

FLI now has a News page. One of its first articles is an article on MIRI by Nathan Collins. I’d like to clarify one passage that’s not necessarily incorrect, but which could lead to misunderstanding:

… consider assigning a robot with superhuman intelligence the task of making paper clips. The robot has a great deal of computational power and general intelligence at its disposal, so it ought to have an easy time figuring out how to fulfill its purpose, right?

Not really. Human reasoning is based on an understanding derived from a combination of personal experience and collective knowledge derived over generations, explains MIRI researcher Nate Soares, who trained in computer science in college. For example, you don’t have to tell managers not to risk their employees’ lives or strip mine the planet to make more paper clips. But AI paper-clip makers are vulnerable to making such mistakes because they do not share our wealth of knowledge. Even if they did, there’s no guarantee that human-engineered intelligent systems would process that knowledge the same way we would.

MIRI’s worry is not that a superhuman AI will find it difficult to fulfill its programmed goal of — to use a silly, arbitrary example — making paperclips. Our worry is that a superhuman AI will be very, very good at achieving its programmed goals, and that unfortunately, the best way to make lots of paperclips (or achieve just about any other goal) involves killing all humans, so that we can’t interfere with the AI’s paperclip making, and so that the AI can use the resources on which our lives depend to make more paperclips. See Bostrom’s “The Superintelligent Will” for a primer on this.

Moreover, a superhuman AI may very well share “our wealth of knowledge.” It will likely be able to read and understand all of Wikipedia, and every history book on Google Books, and the Facebook timeline of more than a billion humans, and so on. It may very well realize that when we programmed it with the goal to make paperclips (or whatever), we didn’t intend for it to kill us all as a side effect.

But that doesn’t matter. In this scenario, we didn’t program the AI to do as we intended. We programmed it to make paperclips. The AI knows we don’t want it to use up all our resources, but it doesn’t care, because we didn’t program it to care about what we intended. We only programmed it to make paperclips, so that’s what it does — very effectively.

“Okay, so then just make sure we program the superhuman AI to do what we intend!”

Yes, exactly. That is the entire point of MIRI’s research program. The problem is that the instruction “do what we intend, in every situation including ones we couldn’t have anticipated, and even as you reprogram yourself to improve your ability to achieve your goals” is incredibly difficult to specify in computer code.

Nobody on Earth knows how to do that, not even close. So our attitude is: we’d better get crackin’.


  1. Shri says

    I haven’t read through nearly all of LW or MIRI’s papers, but this is the best short summary of the problem of Firendly AI and MIRI’s position I’ve seen. I’ll definitely use this brief arguement when introducing the topic to others.

Leave a Reply

Your email address will not be published. Required fields are marked *