Clarifying the Nathan Collins article on MIRI

FLI now has a News page. One of its first articles is an article on MIRI by Nathan Collins. I’d like to clarify one passage that’s not necessarily incorrect, but which could lead to misunderstanding:

… consider assigning a robot with superhuman intelligence the task of making paper clips. The robot has a great deal of computational power and general intelligence at its disposal, so it ought to have an easy time figuring out how to fulfill its purpose, right?

Not really. Human reasoning is based on an understanding derived from a combination of personal experience and collective knowledge derived over generations, explains MIRI researcher Nate Soares, who trained in computer science in college. For example, you don’t have to tell managers not to risk their employees’ lives or strip mine the planet to make more paper clips. But AI paper-clip makers are vulnerable to making such mistakes because they do not share our wealth of knowledge. Even if they did, there’s no guarantee that human-engineered intelligent systems would process that knowledge the same way we would.

MIRI’s worry is not that a superhuman AI will find it difficult to fulfill its programmed goal of — to use a silly, arbitrary example — making paperclips. Our worry is that a superhuman AI will be very, very good at achieving its programmed goals, and that unfortunately, the best way to make lots of paperclips (or achieve just about any other goal) involves killing all humans, so that we can’t interfere with the AI’s paperclip making, and so that the AI can use the resources on which our lives depend to make more paperclips. See Bostrom’s “The Superintelligent Will” for a primer on this.

Moreover, a superhuman AI may very well share “our wealth of knowledge.” It will likely be able to read and understand all of Wikipedia, and every history book on Google Books, and the Facebook timeline of more than a billion humans, and so on. It may very well realize that when we programmed it with the goal to make paperclips (or whatever), we didn’t intend for it to kill us all as a side effect.

But that doesn’t matter. In this scenario, we didn’t program the AI to do as we intended. We programmed it to make paperclips. The AI knows we don’t want it to use up all our resources, but it doesn’t care, because we didn’t program it to care about what we intended. We only programmed it to make paperclips, so that’s what it does — very effectively.

“Okay, so then just make sure we program the superhuman AI to do what we intend!”

Yes, exactly. That is the entire point of MIRI’s research program. The problem is that the instruction “do what we intend, in every situation including ones we couldn’t have anticipated, and even as you reprogram yourself to improve your ability to achieve your goals” is incredibly difficult to specify in computer code.

Nobody on Earth knows how to do that, not even close. So our attitude is: we’d better get crackin’.

Theories of artistic evolution

Carlo Gesualdo was a 16th century prince famous for murdering his wife and her lover in his own bed when he caught them in flagrante delicto there. He left their bodies in front of the palace and moved away to Ferrara, where the most progressive composers of his era lived. After learning from them, he returned to his palace — as a nobleman he was immune to prosecution — and isolated himself there for the rest of his life, doing little else but composing music and hiring singers to perform it for him.

Tortured by guilt, his mental health deteriorated. He had his own servants beat him, and tried to obtain his uncle’s bones as magical relics that might cure his mental problems. As time passed, his music became increasingly emotional and desperate, as well as more experimental. His last book of madrigals is probably the most insane music of its era, so insane that no significant composer followed in his chromatic footsteps for about 300 years, and only then in very different styles. Gesualdo represents a fascinating dead end in musical history.

The painter Van Gogh underwent a similar descent into madness, famously cutting off his own ear and later being admitted to an asylum. Simultaneously, his paintings became increasingly wild and impressionistic.

More recently, The Beatles’ introduction to LSD corresponded with the experimental turn in their music kicked off with Revolver tracks like “Tomorrow Never Knows,” then Sgt. Pepper, then “Revolution 9” and others from the white album.

In all three cases, my brain wants to tell a story about how the artists’ madness (either chronic or temporarily induced) was the driving force behind their increasing creativity.

But whenever your brain wants to tell a story about something, it’s a good idea to take a breath and generate some alternate hypotheses. What else could explain the above cases of artistic change, and indeed artistic change in general?

The theory I favor is that psychological and sociological pressures drive artists toward increasing novelty. No artist makes a name for herself by doing what Beethoven or Turner or Eliot have already done in music, painting, and poetry. No, the artist must do something new, like Igor Stravinsky did when he invented heavy metal in 1913, or like John Adams did when he fused Beethovenian symphonic romanticism with minimalism, or — sigh — like John Cage did when he composed a piece of music instructing the pianist to sit silently at a piano for four and half minutes.

I do know of at least one case where drugs were directly responsible for inventing a new artistic style. In 1956, blues singer Jay Hawkins entered the studio to record a refined love ballad called “I Put a Spell on You.” But first the producer brought in ribs and chicken and beer and got everybody drunk. Jay was so drunk he couldn’t remember the recording session, but when they played back the tape it turned out he had just screamed the song to death like a demented madman, thus accidentally inventing goth rock. After that he performed the song in a long cape after rising out of an onstage coffin, and became known as “Screamin’ Jay Hawkins.”

Scaruffi’s rock criticism

Sometimes I do blatantly useless things so I can flaunt my rejection of the often unhealthy “always optimize” pressures within the effective altruism community. So today, I’m going to write about rock music criticism.

Specifically, I would like to introduce you to the wonder of the world that is Piero Scaruffi. Or, better, I’ll let Holden Karnofsky introduce him:

We can start with his writings on music, since that seems to be what he is known for. He has helpfully ranked the best 100 rock albums of all time in order…

If that’s too broad for you, he also provides his top albums year by year … every single year from 1967 to 2012. He also gives genre-specific rankings for psychedelic music, Canterbury, glam-rock, punk-rock, dream-pop, triphop, jungle … 32 genres in all. Try punching “scaruffi [band]” into Google; I defy you to find a major musician he hasn’t written a review of. These are all just part of the massive online appendix to his self-published two-volume history of rock music. But he’s not just into rock; he’s also written a history of popular music specifically prior to rock-n-roll and a history of jazz music, and he has a similarly deep set of rankings for jazz (best albums, best jazz music for each of 17 different instrument categories, best jazz from each decade). While he hasn’t written a book about classical music, he has put out a timeline of classical music from 1098 to the present day and lists his essential classical music selections in each of ~10 categories

So who is this guy, a music critic? Nope, he is some sort of mostly retired software consultant and I want you to know that his interests go far beyond music. Take literature, for example. He has given both a chronological timeline and a best-novel-ever ranking for each of 36 languages. No I’m serious. Have you been wanting this fellow’s opinion of the 37 best works of Albanian literature, in order? Here you go. Turkish? Right here. Hebrew, Arabic, Armenian, Ethiopian, ancient Egyptian, and Finnish? Got those too.

Naturally, Mr. Scaruffi has not neglected film (he’s given his top 40 films and favorites for each decade starting with the 1910s, along with a history of cinema written in Italian) or visual art (see his 3-part visual history of the visual arts, his list of the greatest paintings of all time and his own collages and photographs) but let’s move on from this fluffy stuff. Because it’s important for you to know about his:

Does this guy just like sit inside and read and write 24 hours a day? Not to hear his travel page tell it: he’s visited 159 countries and is happy to give you guides to several of them along with his “greatest places in the world” rankings. He also has an entirely separate “hiking” section of his website that I haven’t clicked on and am determined not to.

But let’s focus on his rock music criticism, which I think is alternately silly, wrong, and brilliant.

[Read more…]

Computer science writers wanted

My apologies in advance to the computer science journalists I haven’t found yet, but…

Why is there so little good long-form computer science journalism? (Tech journalism doesn’t count.)

When there’s an interesting development in biology, Ed Yong will explain it beautifully in 4,000 words, or Richard Dawkins in 80,000. Or Carl Zimmer, Jonathan Weiner, David Quammen, etc.

Several others sciences attract plenty of writing talent as well. Physics has Sean CarrollStephen Hawking, Brian Greene, Kip Thorne, Lawrence KraussNeil deGrasse Tyson, etc. Psychology has Steven Pinker, Richard Wiseman, Oliver Sacks, V.S. Ramachandran, etc. Medical science has Atul Gawande, Ben GoldacreSiddhartha Mukherjee, etc.

Computer science has Scott Aaronson (e.g. The Limits of Quantum, The Quest for Randomness), Brian Hayes (e.g. The Invention of Genetic Code, The Easiest Hard Problem), and… who else?

Outside Aaronson and Hayes, I mostly see tech journalism, very brief CS news articles, mediocre CS writing, and occasional CS articles and books from good writers who cover a range of scientific disciplines, such as

Maybe CS is too mathematical to attract general readers? Too abstract? Too dry? Or simply not taught in high school like the other sciences? Or maybe there are problems on the supply side?

Key Lessons from Lobbying and Policy Change

Lobbying and Policy Change by Baumgartner et al. is the best book on policy change I’ve read. Hat tip to Holden Karnofsky for recommending this and also Poor Economics, the best book on global poverty reduction I’ve read.

LaPC is perhaps the most data-intensive study of “Who wins in Washington and why?” ever conducted, and the data (and many follow-up studies) are available from the UNC project website here. One review summarized the study design like this:

To start, [the researchers] sample from a comprehensive list of House and Senate lobbying disclosure reports to identify a random universe of participants. After initial interviews with their sample population, the authors assemble a list of 98 issues on which each organizational representative had worked most recently [from 1999-2002, i.e. during two Presidents of opposite parties and two Congresses]. These range from patent extension to chiropractic coverage under Medicare, some very broad and some very specific. Interviewers endeavored to determine the relevant sides of each issue and identify its key players. Separate subsequent interviews were then arranged where possible with representatives from each side of the issue…

With this starting point, the researchers followed their sample of issues for several more years to track who got what they wanted and who didn’t.

Note that their issue sampling method favors issues in which Congress was involved, so “issues relating to the judiciary and that are solely agency-related may be undercounted.”

LaPC is a difficult book to summarize, but below is one attempt. Some findings were surprising, others were not.

  1. One of the best predictors of lobbying success is simply whether one is trying to preserve the status quo, and in fact the single most common lobbying goal is to preserve the status quo.
  2. Some issues had as many as 7 sides, but most had just two.
  3. Most lobbying is targeted at a small percentage of issues.
  4. Very few neutral decision-makers are involved. Where government officials are involved, they are almost always actively lobbying for one side or another. 40% of advocates in this study were government officials; only 60% were lobbyists.
  5. Which kinds of groups were represented by the lobbyists? 26% were citizen groups, 21% were trade/business associations, 14% were corporations, 11% were professional associations, 7% were coalitions specific to an issue, 6% were unions, and 6% were think tanks.
  6. The most common lobbying issues were, in descending order: health (21%), environment (13%), transportation (8%), science and technology (7%), finance and commerce (7%), defense (7%), foreign trade (6%), energy (5%), law, crime, and family policy (5%), and education (5%).
  7. When lobbying, it’s better to be wealthy than poor, but there’s only a weak link between resources and policy-change success.
  8. Policy change tends not to be incremental except in a few areas such as the budget. For most issues, a “building tension then sudden substantial change” model predicts best.
  9. There is substantial correlation between electoral change and policy change, and advocates have increasingly focused on electoral efforts.

If you’re interested in this area, the next book to read is probably Godwin et al’s Lobbying and Policymaking, another decade-long study of policymaking that is largely framed as a reply to LaPC, and was recommended by Baumgartner.

How to study superintelligence strategy

(Last updated Feb. 11, 2015.)

What could an economics graduate student do to improve our strategic picture of superintelligence? What about a computer science professor? A policy analyst at RAND? A program director at IARPA?

In the last chapter of Superintelligence, Nick Bostrom writes:

We find ourselves in a thicket of strategic complexity and surrounded by a dense mist of uncertainty. Though many considerations have been discerned, their details and interrelationships remain unclear and iffy — and there might be other factors we have not thought of yet. How should we act in this predicament?

… Against a backdrop of perplexity and uncertainty, [strategic] analysis stands out as being of particularly high expected value. Illumination of our strategic situation would help us target subsequent interventions more effectively. Strategic analysis is especially needful when we are radically uncertain not just about some detail of some peripheral matter but about the cardinal qualities of the central things. For many key parameters, we are radically uncertain even about their sign…

The hunt for crucial considerations… will often require crisscrossing the boundaries between different academic disciplines and other fields of knowledge.

Bostrom does not, however, provide a list of specific research projects that could illuminate our strategic situation and thereby “help us target subsequent interventions more effectively.”

Below is my personal list of studies which could illuminate our strategic situation with regard to superintelligence. I’m hosting it on my personal site rather than MIRI’s blog to make it clear that this is not “MIRI’s official list of project ideas.” Other researchers at MIRI would, I’m sure, put together a different list. [Read more…]

Expertise vs. intelligence and rationality

When you’re not sure what to think about something, or what to do in a certain situation, do you instinctively turn to a successful domain expert, or to someone you know who seems generally very smart?

I think most people don’t respect individual differences in intelligence and rationality enough. But some people in my local community tend to exhibit the opposite failure mode. They put too much weight on a person’s signals of explicit rationality (“Are they Bayesian?”), and place too little weight on domain expertise (and the domain-specific tacit rationality that often comes with it).

This comes up pretty often during my work for MIRI. We’re considering how to communicate effectively with academics, or how to win grants, or how to build a team of researchers, and some people (not necessarily MIRI staff) will tend to lean heavily on the opinions of the most generally smart people they know, even though those smart people have no demonstrated expertise or success on the issue being considered. In contrast, I usually collect the opinions of some smart people I know, and then mostly just do what people with a long track record of success on the issue say to do. And that dumb heuristic seems to work pretty well.

Yes, there are nuanced judgment calls I have to make about who has expertise on what, exactly, and whether MIRI’s situation is sufficiently analogous for the expert’s advice to work at MIRI. And I must be careful to distinguish credentials-expertise from success-expertise (aka RSPRT-expertise). And this process doesn’t work for decisions on which there are no success-experts, like long-term AI forecasting. But I think it’s easier for smart people to overestimate their ability to model problems outside their domains of expertise, and easier to underestimate all the subtle things domain experts know, than vice-versa.

Will AGI surprise the world?

Yudkowsky writes:

In general and across all instances I can think of so far, I do not agree with the part of your futurological forecast in which you reason, “After event W happens, everyone will see the truth of proposition X, leading them to endorse Y and agree with me about policy decision Z.”

Example 2: “As AI gets more sophisticated, everyone will realize that real AI is on the way and then they’ll start taking Friendly AI development seriously.”

Alternative projection: As AI gets more sophisticated, the rest of society can’t see any difference between the latest breakthrough reported in a press release and that business earlier with Watson beating Ken Jennings or Deep Blue beating Kasparov; it seems like the same sort of press release to them. The same people who were talking about robot overlords earlier continue to talk about robot overlords. The same people who were talking about human irreproducibility continue to talk about human specialness. Concern is expressed over technological unemployment the same as today or Keynes in 1930, and this is used to fuel someone’s previous ideological commitment to a basic income guarantee, inequality reduction, or whatever. The same tiny segment of unusually consequentialist people are concerned about Friendly AI as before. If anyone in the science community does start thinking that superintelligent AI is on the way, they exhibit the same distribution of performance as modern scientists who think it’s on the way, e.g. Hugo de Garis, Ben Goertzel, etc.

My own projection goes more like this:

As AI gets more sophisticated, and as more prestigious AI scientists begin to publicly acknowledge that AI is plausibly only 2-6 decades away, policy-makers and research funders will begin to respond to the AGI safety challenge, just like they began to respond to CFC damages in the late 70s, to global warming in the late 80s, and to synbio developments in the 2010s. As for society at large, I dunno. They’ll think all kinds of random stuff for random reasons, and in some cases this will seriously impede effective policy, as it does in the USA for science education and immigration reform. Because AGI lends itself to arms races and is harder to handle adequately than global warming or nuclear security are, policy-makers and industry leaders will generally know AGI is coming but be unable to fund the needed efforts and coordinate effectively enough to ensure good outcomes.

At least one clear difference between my projection and Yudkowsky’s is that I expect AI-expert performance on the problem to improve substantially as a greater fraction of elite AI scientists begin to think about the issue in Near mode rather than Far mode.

As a friend of mine suggested recently, current elite awareness of the AGI safety challenge is roughly where elite awareness of the global warming challenge was in the early 80s. Except, I expect elite acknowledgement of the AGI safety challenge to spread more slowly than it did for global warming or nuclear security, because AGI is tougher to forecast in general, and involves trickier philosophical nuances. (Nobody was ever tempted to say, “But as the nuclear chain reaction grows in power, it will necessarily become more moral!”)

Still, there is a worryingly non-negligible chance that AGI explodes “out of nowhere.” Sometimes important theorems are proved suddenly after decades of failed attempts by other mathematicians, and sometimes a computational procedure is sped up by 20 orders of magnitude with a single breakthrough.

Some alternatives to “Friendly AI”

What does MIRI’s research program study?

The most established term for this was coined by MIRI founder Eliezer Yudkowsky: “Friendly AI.” The term has some advantages, but it might suggest that MIRI is trying to build C-3PO, and it sounds a bit whimsical for a serious research program.

What about safe AGI or AGI safety? These terms are probably easier to interpret than Friendly AI. Also, people like being safe, and governments like saying they’re funding initiatives to keep the public safe.

A friend of mine worries that these terms could provoke a defensive response (in AI researchers) of “Oh, so you think me and everybody else in AI is working on unsafe AI?” But I’ve never actually heard that response to “AGI safety” in the wild, and AI safety researchers regularly discuss “software system safety” and “AI safety” and “agent safety” and more specific topics like “safe reinforcement learning” without provoking negative reactions from people doing regular AI research.

I’m more worried that a term like “safe AGI” could provoke a response of “So you’re trying to make sure that a system which is smarter than humans, and able to operate in arbitrary real-world environments, and able to invent new technologies to achieve its goals, will be safe? Let me save you some time and tell you right now that’s impossible. Your research program is a pipe dream.”

My reply goes something like “Yeah, it’s way beyond our current capabilities, but lots of things that once looked impossible are now feasible because people worked really hard on them for a long time, and we don’t think we can get the whole world to promise never to build AGI just because it’s hard to make safe, so we’re going to give AGI safety a solid try for a few decades and see what can be discovered.” But that’s probably not all that reassuring. [Read more…]

Don’t neglect the fundamentals

My sports coaches always emphasized “the fundamentals.” For example at basketball practice they spent no time whatsoever teaching us “advanced” moves like behind-the-back passes and alley-oops. They knew that even if advanced moves were memorable, and could allow the team to score 5-15 extra points per game, this effect would be dominated by whether we made our free throws, grabbed our rebounds, and kept our turnovers to a minimum.

When I began my internship at what was then called SIAI, I thought, “Wow. SIAI has implemented few business/non-profit fundamentals, and is surviving almost entirely via advanced moves.” So, Louie Helm and I spent much of our first two years at MIRI mastering the (kinda boring) fundamentals, and my impression is that doing so paid off handsomely in organizational robustness and productivity.

On Less Wrong, some kinds of “advanced moves” are sometimes called “Munchkin ideas”:

A Munchkin is the sort of person who, faced with a role-playing game, reads through the rulebooks over and over until he finds a way to combine three innocuous-seeming magical items into a cycle of infinite wish spells. Or who, in real life, composes a surprisingly effective diet out of drinking a quarter-cup of extra-light olive oil at least one hour before and after tasting anything else. Or combines liquid nitrogen and antifreeze and life-insurance policies into a ridiculously cheap method of defeating the invincible specter of unavoidable Death.

Munchkin ideas are more valuable in life than advanced moves are in a basketball game because the upsides in life are much greater. The outcome of a basketball game is binary (win/lose), and advanced moves can’t increase your odds of winning by that much. But in life in general, a good Munchkin idea might find your life partner or make you a billion dollars or maybe even optimize literally everything.

But Munchkin ideas work best when you’ve mastered the fundamentals first. Behind-the-back passes won’t save you if you make lots of turnovers due to poor dribbling skills. Your innovative startup idea won’t do you much good if you sign unusual contracts that make your startup grossly unattractive to investors. And a Munchkin-ish nonprofit can only grow so much without bookkeeping, financial controls, and a donor database.

My guess is that when you’re launching a new startup or organization, the fundamentals can wait. “Do things that don’t scale,” as Paul Graham says. But after you’ve got some momentum then yes, get your shit together, master the fundamentals, and do things in ways that can scale.

This advice is audience-specific. To an audience of Protestant Midwesterners, I would emphasize the importance of Munchkinism. To my actual audience of high-IQ entrepreneurial world-changers, who want to signal their intelligence and Munchkinism to each other, I say “Don’t neglect the fundamentals.” Executing the fundamentals competently doesn’t particularly signal high intelligence, but it’s worth doing anyway.

The Riddle of Being or Nothingness

being or nothingnessJon Ronson’s The Psycopath Test (2011) opens with the strange story of Being or Nothingness:

Last July, Deborah received a strange package in the mail…  The package contained a book. It was only forty-two pages long, twenty-one of which—every other page—were completely blank, but everything about it—the paper, the illustrations, the typeface—looked very expensively produced. The cover was a delicate, eerie picture of two disembodied hands drawing each other. Deborah recognized it to be a reproduction of M. C. Escher’s Drawing Hands.

The author was a “Joe K” (a reference to Kafka’s Josef K., maybe, or an anagram of “joke”?) and the title was Being or Nothingness, which was some kind of allusion to Sartre’s 1943 essay, Being and Nothingness. Someone had carefully cut out with scissors the page that would have listed the publishing and copyright details, the ISBN, etc., so there were no clues there. A sticker read: Warning! Please study the letter to Professor Hofstadter before you read the book. Good Luck!

Deborah leafed through it. It was obviously some kind of puzzle waiting to be solved, with cryptic verse and pages where words had been cut out, and so on.

Everyone at MIRI was pretty amused when a copy of Being or Nothingness arrived at our offices last year, addressed to Eliezer.

Everyone except Eliezer, anyway. He just rolled his eyes and said, “Do what you want with it; I’ve been getting crazy stuff like that for years.”

[Read more…]

An onion strategy for AGI discussion

The stabilization of environments” is a paper about AIs that reshape their environments to make it easier to achieve their goals. This is typically called enforcement, but they prefer the term stabilization because it “sounds less hostile.”

“I’ll open the pod bay doors, Dave, but then I’m going to stabilize the ship… ”

Sparrow (2013) takes the opposite approach to plain vs. dramatic language. Rather than using a modest term like iterated embryo selection, Sparrow prefers the phrase in vitro eugenics. Jeepers.

I suppose that’s more likely to provoke public discussion, but…  will much good will come of that public discussion? The public had a needless freak-out about in vitro fertilization back in the 60s and 70s and then, as soon as the first IVF baby was born in 1978, decided they were in favor of it.

Someone recently suggested I use an “onion strategy” for the discussion of novel technological risks. The outermost layer of the communication onion would be aimed at the general public, and focus on benefits rather than risks, so as not to provoke an unproductive panic. A second layer for a specialist audience could include a more detailed elaboration of the risks. The most complete discussion of risks and mitigation options would be reserved for technical publications that are read only by professionals.

Eric Drexler seems to wish he had more successfully used an onion strategy when writing about nanotechnology. Engines of Creation included frank discussions of both the benefits and risks of nanotechnology, including the “grey goo” scenario that was discussed widely in the media and used as the premise for the bestselling novel Prey.

Ray Kurzweil may be using an onion strategy, or at least keeping his writing in the outermost layer. If you look carefully, chapter 8 of The Singularity is Near takes technological risks pretty seriously, and yet it’s written in such a way that most people who read the book seem to come away with an overwhelmingly optimistic perspective on technological change.

George Church may be following an onion strategy. Regenesis also contains a chapter on the risks of advanced bioengineering, but it’s presented as an “epilogue” that many readers will skip.

Perhaps those of us writing about AGI for the general public should try to discuss:

  • astronomical stakes rather than existential risk
  • Friendly AI rather than AGI risk or the superintelligence control problem
  • the orthogonality thesis and convergent instrumental values and complexity of values rather than “doom by default”
  • etc.

MIRI doesn’t have any official recommendations on the matter, but these days I find myself leaning toward an onion strategy.