The latest episode of Talking Machines features an interview with Ilya Sutskever, the research director at OpenAI. His comments on long-term AI safety in particular were (starting around 28:10):
Interviewer: There’s a part of [OpenAI’s introductory blog post] that I found particularly interesting, which says “It’s hard to fathom It’s hard to fathom how much human-level AI could benefit society, and it’s equally hard to imagine how much it could damage society if built or used incorrectly.” So what are the reasonable questions that we should be thinking about in terms of safety now? …
Sutskever: … I think, and many people think, that full human-level AI … might perhaps be invented in some number of decades … [and] will obviously have a huge, inconceivable impact on society. That’s obvious. And when a technology will predictably have as much impact, there is nothing to lose from starting to think about the nature of this impact … and also whether there is any research that can be done today that will make this impact be more like the kind of impact we want.
The question of safety really boils down to this: …If you look at our neural networks that for example recognize images, they’re doing a pretty good job but once in a while they make errors [and it’s] hard to understand where they come from.
For example I use Google photo search to index my own photos… and it’s really accurate almost all the time, but sometimes I’ll search for a photo of a dog, let’s say, and it will find a photo [that is] clearly not a dog. Why does it make this mistake? You could say “Who cares? It’s just object recognition,” and I agree. But if you look down the line, what you’ll see is that right now we are [just beginning to] create agents, for example the Atari work of DeepMind or the robotics work of Berkeley, where you’re building a neural network that learns to control something which interacts with the world. At present, their cost functions [i.e. goal functions] are manually specified. But it… seems likely that eventually we will be building robots whose cost functions will be learned from demonstration, or from watching a YouTube video, or from the interpretation of natural text…
So now you have these really complicated cost functions that are difficult to understand, and you have a physical robot or some kind of software system which tries to optimize this cost function, and I think these are the kinds of scenarios that could be relevant for AI safety questions. Once you have a system like this, what do you need to do to be reasonably certain that it will do what you want it to do?
…because we don’t work on such systems [today], these questions may seem a bit premature, but once we start building reinforcement learning systems [which] do learn the cost function, I think this question will become much more sharply in focus. Of course it would also be nice to do theoretical research, but it’s not clear to me how it could be done.
Interviewer: So right now we have the opportunity to understand the fundamentals… and then apply them later as the research continues and grows and is able to create more powerful systems?
Sutskever: That would be the ideal case, definitely. I think it’s worth trying to do that. I think it may also be hard to do because it seems like we have such a hard time imagining [what] these future systems will look like. We can speak in general terms: Yes, there will be a cost function most likely. But how, exactly, will it be optimized? It’s a little hard to predict because if you could predict it we could just go ahead and build the systems already.