State capacity backups

Most people around the world — except for residents of a handful of competent countries such as New Zealand, Vietnam, and Rwanda — have now spent an entire year watching their government fail miserably to prepare for and respond to a very predictable (and predicted) pandemic, for example by:

  • sending masks to everyone, and promising to buy lots of masks
  • testing tons of people regularly and analyzing the results
  • doing contract tracing
  • promising to buy tons of vaccine doses, very early
  • setting up vaccination facilities, and making them trivially easy to find and use

My friend and colleague Daniel Dewey recently noted that it seems like private actors could have greatly mitigated the impact of the pandemic by creating in advance a variety of “state capacity backups,” i.e. organizations that are ready to do the things we’d want governments to do, if a catastrophe strikes and government response is ineffective.

A state capacity backup could do some things unilaterally (e.g. stockpile and ship masks), and in other cases it could offer its services to governments for functions it can’t perform without state sign-off (e.g. setting up vaccination facilities).

I would like to see more exploration of this idea, including analyses of past examples of privately-provided “state capacity backups” and how well they worked.

Superforecasting in a nutshell

Let’s say you want to know how likely it is that an innovative new product will succeed, or that China will invade Taiwan in the next decade, or that a global pandemic will sweep the world — basically any question for which you can’t just use “predictive analytics,” because you don’t have a giant dataset you can plug into some statistical models like (say) Amazon can when predicting when your package will arrive.

Is it possible to produce reliable, accurate forecasts for such questions?

Somewhat amazingly, the answer appears to be “yes, if you do it right.”

Prediction markets are one promising method for doing this, but they’re mostly illegal in the US, and various implementation problems hinder their accuracy for now. Fortunately, there is also the “superforecasting” method, which is completely legal and very effective.

How does it work? The basic idea is very simple. The steps are:

  1. First, bother to measure forecasting accuracy at all. Some industries care a lot about their forecasting accuracy and therefore measure it, for example hedge funds. But most forecasting-heavy industries don’t make much attempt to measure their forecasting accuracy, for example journalism, philanthropy, scientific research, or the US intelligence community.
  2. Second, identify the people who are consistently more accurate than everyone else — say, those in the top 0.1% for accuracy, for multiple years in a row (without regression to the mean). These are your “superforecasters.”
  3. Finally, pose your forecasting questions to the superforecasters, and use an aggregate of their predictions.

Technically, the usual method is a bit more complicated than that, but these three simple steps are the core of the superforecasting method.

So, how well does this work?

A few years ago, the US intelligence community tested this method in a massive, rigorous forecasting tournament that included multiple randomized controlled trials and produced over a million forecasts on >500 geopolitical forecasting questions such as “Will there be a violent incident in the South China Sea in 2013 that kills at least one person?” This study found that:

  1. This method produced forecasts that were very well-calibrated, in the sense that forecasts made with 20% confidence came true 20% of the time, forecasts made with 80% confidence came true 80% of the time, and so on. The method is not a crystal ball; it can’t tell you for sure whether China will invade Taiwan in the next decade, but if it tells you there’s a 10% chance, then you can be pretty confident the odds really are pretty close to 10%, and decide what policy is appropriate given that level of risk.
  2. This method produced forecasts that were far more accurate than those of a typical forecaster or other approaches that were tried, and ~30% more accurate than intelligence community analysts who (unlike the superforecasters ) had access to expensively-collected classified information and years of training in the geopolitical issues they were making forecasts about.

Those are pretty amazing results! And from an unusually careful and rigorous study, no less!

So you might think the US intelligence community has eagerly adopted the superforecasting method, especially since the study was funded by the intelligence community, specifically for the purpose of discovering ways to improve the accuracy of US intelligence estimates used by policymakers to make tough decisions. Unfortunately, in my experience, very few people in the US intelligence and national security communities have even heard of these results, or even the term “superforecasting.”

A large organization such as the CIA or the Department of Defense has enough people, and makes enough forecasts, that it could implement all steps of the superforecasting method itself, if it wanted to. Smaller organizations, fortunately, can just contract already-verified superforecasters to make well-calibrated forecasts about the questions of greatest importance to their decision-making. In particular:

  • The superforecasters who out-predicted intelligence community analysts in the forecasting tournament described above are available to be contracted through Good Judgment Inc.
  • Another company, Hypermind, offers aggregated forecasts from “champion forecasters,” i.e. the most accurate forecasters across thousands of forecasting questions for corporate clients going back (in some cases) almost two decades.
  • Several other projects, for example Metaculus, are also beginning to identify forecasters with unusually high accuracy across hundreds of questions.

These companies each have their own strengths and weaknesses, and Open Philanthropy has commissioned forecasts from all three in the past couple years. If you work for a small organization that regularly makes important decisions based on what you expect to happen in the future, including what you expect to happen if you make one decision vs. another, I suggest you try them out. (All three offer “conditional” questions, e.g. “What’s the probability of outcome X if I make decision A, and what’s the probability of that same outcome if I instead make decision B?”)

If you work for an organization that is very large and/or works with highly sensitive information, for example the CIA, you should consider implementing the entire superforecasting process internally. (Though contracting one or more of the above organizations might be a good way to test the model cheaply before going all-in.)

Different challenges faced by consumers of rock, jazz, or classical music recordings

I’ve noticed some practical differences in the challenges and conveniences faced by consumers of rock, jazz, and (“Western”) classical music recordings.

General notes:

  • Overall, I think it’s easiest and most convenient to be a consumer of rock recordings, somewhat harder to be a consumer of jazz recordings, and much harder to be a consumer of classical recordings.
  • By “classical” I mean to include contemporary classical.
  • Pop and hip-hop and (rock-descended) electronic music mostly follow the rock model. I’m less familiar with the markets for recordings of folk musics and “non-Western classical” musics.

Covers

  • Rock: Cover songs are fairly rare, especially on studio albums (as compared to live recordings).
  • Jazz: Cover tracks are common, including on studio albums. Many of the most popular and/or well-regarded jazz albums consist largely or mostly of covers.
  • Classical: Almost all tracks are cover tracks, especially if weighting by sales.

Performers/composers

  • Rock: One or more of the performers are typically also the composers, though this is less true at the big-label pop end of the spectrum.
  • Jazz: Except for the covers, one of the performers is usually the composer, though the composer plays a smaller role than in rock because improvisation is a major aspect.
  • Classical: Composers rarely perform their own work on recordings.

Labeling/attribution

  • Rock: Simple artist + album/track labeling, because the composer(s) and performer(s) are often entirely or partly the same, and tracks composed by a single member of the band are just attributed to the band (e.g. “The Beatles” instead of “John Lennon” or “Paul McCartney”).
  • Jazz: Albums and tracks are usually labeled according to the performer, even when the composer is someone else. (E.g. Closer is attributed to Paul Bley even though Carla Bley composed most of it.)
  • Classical: Album titles might be a list of all pieces on the album, or the title of just one of several pieces on the album, or something else. As for the “artist,” on the cover art and/or in online stores/services/databases, sometimes the composer(s) will be emphasized, sometimes the performer(s) will be emphasized, and sometimes the conductor will be emphasized. For any given album, it might be listed under the composer in one store/service/database, listed under the composer in another store/service/database, and listed under the performer(s) under a third store/service/database.

Canonical recordings

  • Rock: Most pieces (identified by artist+song) have one canonical recording, usually the version from first studio album it appeared on. So when people refer to a piece by artist+song, everyone is talking about the same thing.
  • Jazz: Many pieces (identified by performer+piece or composer+piece) lack a canonical recording, because different versions of it often appear on multiple recordings by the same performer, sometimes the earliest version is not the most popular version, and consumers and critics disagree on which version is best.
  • Classical: For the most part, only less-popular contemporary pieces (identified by composer+piece) have a canonical recording. Everything else typically lacks a canonical recording because the earliest recording is rarely the most popular version, and consumers and critics disagree on which recording of a piece is best.

Genre tags

  • Rock: Hundreds of narrow and informative genre tags are in wide use, e.g. not just “metal” but “death metal” and even “technical death metal.”
  • Jazz: Only a couple dozen genre tags are in wide use, so it can be very hard to know from genre tags what an album will sound like. Different albums labeled simply “avant-garde jazz” or “post-bop” or “jazz fusion” can sound extremely different from each other.
  • Classical: Only a couple dozen genre tags are in wide use, so it can be very hard to know from genre tags what a piece will sound like. Moreover, classical music after ~1910 is far more unique on average (per piece) than rock or jazz, because the incentives for innovation are higher, so classical music after ~1910 “needs” more genre tags than rock or jazz.

Ratings

  • Rock: Reviewers often provide a quick-take rating, e.g. “3 out of 5 stars” or “8.5/10,” which makes it easier for you to filter for music you might like.
  • Jazz: Quick-take ratings from reviewers are uncommon but not rare.
  • Classical: Quick-take ratings from reviewers are fairly rare.

Availability

  • Rock: Most tracks are recorded and released within a few years of being composed.
  • Jazz: Most tracks are recorded and released within a few years of being composed.
  • Classical: Even after the invention of cheap recording equipment and cheap release methods, very few pieces are recorded and released within 5 years of being composed.

(I’ve now re-organized this post by feature rather than by super-genre.)

Initial observations from my 2nd tour of rock history

I’m still listening through Scaruffi’s rock history, building my rock snob playlist as I go. A few observations so far:

  1. Relative to last time I listened through Scaruffi’s rock history (>8 years ago IIRC), my tastes have evolved quite a lot. I notice I’m more quickly bored by most forms of pop, punk, and heavy metal than I used to be. The genre I now seem to most reliably enjoy is the experimental end of prog-rock (e.g. avant-prog, zeuhl). I also enjoy jazz-influenced rock a lot more this time, presumably in part because I listened through Scaruffi’s jazz history (and made this guide) a couple years ago.
  2. I am more convinced than ever that tons of great musical ideas, even just within the “rock” paradigm, have never been explored. I’m constantly noticing things like “Oh, you know what’d be awesome? If somebody mixed the rhythm section of A with the suite structure of B and the production approach of C.” And because my listen through rock history has been so thorough this time (including thousands of artists not included in Scaruffi’s history), I’m more confident than ever that those ideas simply have never been attempted. It’s been a similar experience to studying a wide variety of scientific fields: the more topics and subtopics you study, the more you realize that the “surface area” between current scientific knowledge and what is currently unknown is even larger than you could have seen before.
  3. I still usually dislike “death growl” singing, traditional opera singing, and most rapping. I wish there were more “instrumental only” releases for these genres so I could have a shot at enjoying them.
  4. Spotify’s catalogue is very choppy. E.g. Spotify seems to have most of the albums from chapter 4.12 of Scaruffi’s history, and very few albums from chapter 4.13. (I assume this is also true for iTunes and other streaming providers.)

My worldview in 5 books

If you wanted to communicate as much as possible to someone about your worldview by asking them to read just five books, which five books would you choose?

My choices are below. If you post your answer to this question to Twitter, please use the hash tag #WorldviewIn5Books (like I did), so everyone posting their list can find each other.

1. Eliezer Yudkowsky, Rationality: From AI to Zombies

(2015; ebook/audiobook/podcast)

A singular introduction to critical thinking, rationality, and naturalistic philosophy. Both more advanced and more practically useful than any comparable guide I’ve encountered.

2. Sean Carroll, The Big Picture

(2016; ebook/paperback/audiobook)

If Yudkowsky’s book is “how to think 101,” then Carroll’s book is “what to think 101,” i.e. an introduction to what exists and how it works, according to standard scientific naturalism.

3. William MacAskill, Doing Good Better

(2015; ebook/paperback/audiobook)

My current favorite “how to do good 101” book, covering important practical considerations such as scale of impact, tractability, neglectedness, efficiency, cause neutrality, counterfactuals, and some strategies for thinking about expected value across diverse cause areas.

Importantly, it’s missing (a) a quick survey of the strongest arguments for and against utilitarianism, and (b) much discussion of near-term vs. animal-inclusive vs. long-term views and their implications (when paired with lots of empirical facts). But those topics are understandably beyond the book’s scope, and in any case there aren’t yet any books with good coverage of (a) and (b), in my opinion.

4. Steven Pinker, Enlightenment Now

(2018; ebook/paperback/audiobook)

Almost everything has gotten dramatically better for humans over the past few centuries, likely substantially due to the spread and application of reason, science, and humanism.

5. Toby Ord, forthcoming book about the importance of the long-term future

(forthcoming)

Yes, listing a future book is cheating, but I’m doing it anyway. The importance of the long-term future plays a big role in my current worldview, but there isn’t yet a book that captures my views on the topic well, and from my correspondence with Toby so far, I suspect his forthcoming book on the topic will finally do the topic justice. While you’re waiting for the book to be released, you can get a preview via this podcast interview with Toby.

A few notes about my choices

  • These aren’t my favorite books, nor the books that most influenced me historically. Rather, these are the books that best express key aspects of my worldview. In other words, they are the books I’d most want someone else to read first if we were about to have a long and detailed debate about something complicated, so they’d have some sense of “where I’m coming from.”
  • Obviously, there is plenty in these books that I disagree with.
  • I didn’t include any giant college textbooks or encyclopedias; that’d be cheating.
  • I wish there was a book that summarized many of my key political views, but in my case, I doubt any such book exists.
  • Economic thinking also plays a big role in my worldview, but I’ve not yet found a book that I think does a good job of integrating economic theory with careful, skeptical discussions of the most relevant empirical data (which often come from fields outside economics, and often differ from the predictions of economic models) across a decent range of the most important questions in economics.
  • These books are all quite recent. Older books suffer from their lack of access to recent scientific and philosophical progress, for example (a) the last several decades of the cognitive science of human reasoning, (b) the latest estimates of the effectiveness of various interventions to save and improve people’s lives, (c) the latest historical and regional estimates of various aspects of human well-being and their correlates, and (d) recent arguments about moral uncertainty and what to do about it.

As always, these are my views and not my employer’s.

There was only one industrial revolution

Many people these days talk about an impending “fourth industrial revolution” led by AI, the internet of things, 3D printing, quantum computing, and more. The first three revolutions are supposed to be:

  • 1st industrial revolution (~1800-1870): the world industrializes for the first time via steam, textiles, etc.
  • 2nd industrial revolution (1870-1914): continued huge growth via steel, oil, other things, and especially electricity.
  • 3rd industrial revolution (1980-today): personal computers, internet, etc.

I think this is a misleading framing for the last few centuries, though, because one of these things is not remotely like the others. As far as I can tell, the major curves of human well-being and empowerment bent exactly once in recorded history, during the “1st” industrial revolution:

all curves, with events

(And yes, there’s still a sharp jump around 1800-1870 if you chart this on a log scale.)

The “2nd” and “3rd” industrial revolutions, if they are coherent notions at all, merely continued the new civilizational trajectory created by the “1st” industrial revolution.

I think this is important for thinking about how big certain future developments might be. For example, authors of papers at some top machine learning conference seem to think there’s a decent chance that “unaided machines [will be able to] accomplish every task better and more cheaply than human workers” sometime in the next few decades. There’s plenty of reason to doubt this aggregate forecast, but if that happens, I think the impact would likely be on the scale of the (original) industrial revolution, rather than that of e.g. the (so small it’s hard to measure?) impact of the “3rd” industrial revolution. But for some other technologies (e.g. “internet of things”), it’s hard to tell a story for how it could possibly be as big a deal as the original industrial revolution.

Three wild speculations from amateur quantitative macrohistory

Note: As usual, these are my personal guesses and opinions, not those of my employer.

In How big a deal was the Industrial Revolution?, I looked for measures (or proxy measures) of human well-being / empowerment for which we have “decent” scholarly estimates of the global average going back thousands of years. For reasons elaborated at some length in the full report, I ended up going with:

  1. Physical health, as measured by life expectancy at birth.
  2. Economic well-being, as measured by GDP per capita (PPP) and percent of people living in extreme poverty.
  3. Energy capture, in kilocalories per person per day.
  4. Technological empowerment, as measured by war-making capacity.
  5. Political freedom to live the kind of life one wants to live, as measured by percent of people living in a democracy.

(I also especially wanted measures of subjective well-being and social well-being, and also of political freedom as measured by global rates of slavery, but these data aren’t available; see the report.)

Anyway, the punchline of the report is that when you chart these six measures over the past few millennia (data; zoomable), you get a chart like this (axes removed for space reasons): [Read more…]

A few thoughts for religious believers struggling with doubts about their faith

In various places on my old atheism blog, I share advice for religious believers who are struggling with their faith, or who have recently deconverted, and who are feeling a bit lost, worried about nihilism without religion, and so on.

Here is my “FAQ for the sort of person who usually contacts me about how they’re struggling with their faith, or recently deconverted.”

 

Now that I’m losing my faith, I’m worried that nothing really matters, and that’s depressing.

I remember that feeling. I was pretty anxious and depressed when I started to realize I didn’t have good reasons for believing the doctrines of the religion I’d been raised in. But as time passed, things got better, and I emotionally adjusted to my “new normal,” in a way that I thought couldn’t ever happen before I got there.

I’ve collected some recommended reading on these topics here; see also the more recent The Big Picture. It’s up to you to decide what your goals and purposes are, but I think there are plenty of purposes worth getting excited about and invested in. In my case that’s effective altruism, but that’s a personal choice.

But really, my primary piece of advice is to just let more time pass, and spend time socially with non-religious people. Your conscious, deliberative brain (“system 2“) might be able to rationally recognize that of course millions of non-religious people before you have managed to live lives of immense joy and purpose and so on, and therefore you clearly don’t need religion for that. But if you were raised religiously like I was, then it might take some time for your unconscious, intuitive, emotional brain (“system 1“) to also “believe” this. The more time you spend talking with non-religious people who are living fulfilling, purposeful lives, the more you’ll train your system 1 to see that it’s obvious that meaning and purpose are possible without any gods — and getting your system 1 to “change its mind” is probably what matters more.

Where I live in the San Francisco Bay Area, it seems that most people I meet are excitedly trying to “make the world a better place” in some way (as parodied on the show Silicon Valley), and virtually none of them are religious. Depending on where you live, it might not be quite so easy to find non-religious people to hang out with. You could google for atheist or agnostic meetups in your area, or at least in the nearest large city. You could also try attending a UU church, where most people seem to be “spiritual” but not “religious” in the traditional sense.

My spouse and/or kids are religious, and my loss of faith is going to be super hard on them.

Yeah, that’s a tougher situation. I don’t know anything about that. Fortunately there’s a recent book entirely about that subject; I hope it helps!

Thanks, I’ll try those things. But I think I need more help.

I would try normal psychotherapy if you can afford it. Or maybe better, try Tom Clark, who specializes in “worldview counseling.”

15 classical music traditions, compared

Other Classical Musics argues that there are at least 15 musical traditions around the world worthy of the title “classical music”:

According to our rule-of-thumb, a classical music will have evolved… where a wealthy class of connoisseurs has stimulated its creation by a quasi-priesthood of professionals; it will have enjoyed high social esteem. It will also have had the time and space to develop rules of composition and performance, and to allow the evolution of a canon of works, or forms… our definition does imply acceptance of a ‘classical/ folk-popular’ divide. That distinction is made on the assumption that these categories simply occupy opposite ends of a spectrum, because almost all classical music has vernacular roots, and periodically renews itself from them…

In one of the earliest known [Western] definitions, classique is translated as ‘classical, formall, orderlie, in due or fit ranke; also, approved, authenticall, chiefe, principall’. The implication there was: authority, formal discipline, models of excellence. A century later ‘classical’ came to stand also for a canon of works in performance. Yet almost every non-Western culture has its own concept of ‘classical’ and many employ criteria similar to the European ones, though usually with the additional function of symbolizing national culture…

By definition, the conditions required for the evolution of a classical music don’t exist in newly-formed societies: hence the absence of a representative tradition from South America.

I don’t understand the book’s criteria. E.g. jazz is included despite not having been created by “a quasi-priesthood of professionals” funded by “a wealthy class of connoisseurs,” and despite having been invented relatively recently, in the early 20th century.

[Read more…]

Technology forecasts from The Year 2000

In The Age of Em, Robin Hanson is pretty optimistic about our ability to forecast the long-term future:

Some say that there is little point in trying to foresee the non-immediate future. But in fact there have been many successful forecasts of this sort.

In the rest of this section, Hanson cites eight examples of forecasting success.  Two of his examples of “success” are forecasts of technologies that haven’t arrived yet: atomically precise manufacturing and advanced starships. Another of his examples is The Year 2000:

A particularly accurate book in predicting the future was The Year 2000, a 1967 book by Herman Kahn and Anthony Wiener (Kahn and Wiener 1967). It accurately predicted population, was 80% correct for computer and communication technology, and 50% correct for other technology (Albright 2002).

As it happens, when I first read this paragraph I had already begun to evaluate the technology forecasts from The Year 2000 for the Open Philanthropy Project, relying on the same source Hanson did for determining which forecasts came true and which did not (Albright 2002).

However, my assessment of Kahn & Wiener’s forecasting performance is much less rosy than Hanson’s. For details, see here.

Philosophical habits of mind

In an interesting short paper from 1993, Bernard Baars and Katharine McGovern list several philosophical “habits of mind” and contrast them with typical scientific habits of mind. The philosophical habits of mind they list, somewhat paraphrased, are:

  1. A great preference for problems that have survived centuries of debate, largely intact.
  2. A tendency to set the most demanding criteria for success, rather than more achievable ones.
  3. Frequent appeal to thought experiments (rather than non-intuitional evidence) to carry the major burden of argument.
  4. More focus on rhetorical brilliance than testability.
  5. A delight in paradoxes and “impossibility proofs.”
  6. Shifting, slippery definitions.
  7. A tendency to legislate the empirical sciences.

I partially agree with this list, and would add several items of my own.

Obviously this list does not describe all of philosophy. Also, I think (English-language) philosophy as a whole has become more scientific since 1993.

Rapoport’s First Rule and Efficient Reading

Philosopher Daniel Dennett advocates following “Rapoport’s Rules” when writing critical commentary. He summarizes the first of Rapoport’s Rules this way:

You should attempt to re-express your target’s position so clearly, vividly, and fairly that your target says, “Thanks, I wish I’d thought of putting it that way.”

If you’ve read many scientific and philosophical debates, you’re aware that this rule is almost never followed. And in many cases it may be inappropriate, or not worth the cost, to follow it. But for someone like me, who spends a lot of time trying to quickly form initial impressions about the state of various scientific or philosophical debates, it can be incredibly valuable and time-saving to find a writer who follows Rapoport’s First Rule, even if I end up disagreeing with that writer’s conclusions.

One writer who, in my opinion, seems to follow Rapoport’s First Rule unusually well is Dennett’s “arch-nemesis” on the topic of consciousness, the philosopher David Chalmers. Amazingly, even Dennett seems to think that Chalmers embodies Rapoport’s 1st Rule. Dennett writes:

Chalmers manifestly understands the arguments [for and against type-A materialism, which is Dennett’s view]; he has put them as well and as carefully as anybody ever has… he has presented excellent versions of [the arguments for type-A materialism] himself, and failed to convince himself. I do not mind conceding that I could not have done as good a job, let alone a better job, of marshaling the grounds for type-A materialism. So why does he cling like a limpet to his property dualism?

As far as I can tell, Dennett is saying “Thanks, Chalmers, I wish I’d thought of putting the arguments for my view that way.”

And because of Chalmers’ clarity and fairness, I have found Chalmers’ writings on consciousness to be more efficiently informative than Dennett’s, even though my own current best-guesses about the nature of consciousness are much closer to Dennett’s than to Chalmers’.

Contrast this with what I find to be more typical in the consciousness literature (and in many other literatures), which is for an article’s author(s) to present as many arguments as they can think of for their own view, and downplay or mischaracterize or not-even-mention the arguments against their view.

I’ll describe one example, without naming names. Recently I read two recent papers, each of which had a section discussing the evidence for or against the “cortex-required view,” which is the view that a cortex is required for phenomenal consciousness. (I’ll abbreviate it as “CRV.”)

The pro-CRV paper is written as though it’s a closed case that a cortex is required for consciousness, and it doesn’t cite any of the literature suggesting the opposite. Meanwhile, the anti-CRV paper is written as though it’s a closed case that a cortex isn’t required for consciousness, and it doesn’t cite any literature suggesting that it is required. Their differing passages on CRV cite literally zero of the same sources. Each paper pretends as though the entire body of literature cited by the other paper just doesn’t exist.

If you happened to read only one of these papers, you’d come a way with a very skewed view of the likelihood of the cortex-required view. You might realize how skewed that view is later, but if you’re reading only a few papers on the topic, so that you can form an impression quickly, you might not.

So here’s one tip for digging through some literature quickly: try to find out which expert(s) on that topic, if any, seem to follow Rapoport’s First Rule — even if you don’t find their conclusions compelling.

Seeking case studies in scientific reduction and conceptual evolution

Tim Minchin once said “Every mystery ever solved has turned out to be not magic.” One thing I want to understand better is “How, exactly, has that happened in history? In particular, how have our naive pre-scientific concepts evolved in response to, or been eliminated by, scientific progress?

Examples: What is the detailed story of how “water” came to be identified with H2O? How did our concept of “heat” evolve over time, including e.g. when we split it off from our concept of “temperature”? What is the detailed story of how “life” came to be identified with a large set of interacting processes with unclear edge cases such as viruses decided only by convention? What is the detailed story of how “soul” was eliminated from our scientific ontology rather than being remapped onto something “conceptually close” to our earlier conception of it, but which actually exists?

I wish there was a handbook of detailed case studies in scientific reductionism from a variety of scientific disciplines, but I haven’t found any such book yet. The documents I’ve found that are closest to what I want are perhaps:

Some semi-detailed case studies also show up in Kuhn, Feyerabend, etc. but they are typically buried in a mass of more theoretical discussion. I’d prefer to read histories that focus on the historical developments.

Got any such case studies, or collections of case studies, to recommend?

The Big Picture

Sean Carroll’s The Big Picture is a pretty decent “worldview naturalism 101” book.

In case there’s a 2nd edition in the future, and in case Carroll cares about the opinions of a professional dilettante (aka a generalist research analyst without even a bachelor’s degree), here are my requests for the 2nd edition:

  • I think Carroll is too quick to say which physicalist approach to phenomenal consciousness is correct, and doesn’t present alternate approaches as compellingly as he could (before explaining why he rejects them). (See especially chs. 41-42.)
  • In the chapter on death, I wish Carroll had acknowledged that neither physics nor naturalism requires that we live lives as short as we now do, and that there are speculative future technological capabilities that might allow future humans (or perhaps some now living) to live very long lives (albeit not infinitely long lives).
  • I wish Carroll had mentioned Tegmark levels, maybe in chs. 25 or 36.

Tetlock wants suggestions for strong AI signposts

In my 2013 article on strong AI forecasting, I made several suggestions for how to do better at forecasting strong AI, including this suggestion quoted from Phil Tetlock, arguably the leading forecasting researcher in the world:

Signposting the future: Thinking through specific scenarios can be useful if those scenarios “come with clear diagnostic signposts that policymakers can use to gauge whether they are moving toward or away from one scenario or another… Falsifiable hypotheses bring high-flying scenario abstractions back to Earth.”

Tetlock hadn’t mentioned strong AI at the time, but now it turns out he wants suggestions for strong AI signposts that could be forecast on GJOpen, the forecasting tournament platform.

Specifying crisply formulated signpost questions is not easy. If you come up with some candidates, consider posting them in the comments below. After a while, I will collect them all together and send them to Tetlock. (I figure that’s probably better than a bunch of different people sending Tetlock individual emails with overlapping suggestions.)

Tetlock’s framework for thinking about such signposts, which he calls “Bayesian question clustering,” is described in Superforecasting:

In the spring of 2013 I met with Paul Saffo, a Silicon Valley futurist and scenario consultant. Another unnerving crisis was brewing on the Korean peninsula, so when I sketched the forecasting tournament for Saffo, I mentioned a question IARPA had asked: Will North Korea “attempt to launch a multistage rocket between 7 January 2013 and 1 September 2013?” Saffo thought it was trivial. A few colonels in the Pentagon might be interested, he said, but it’s not the question most people would ask. “The more fundamental question is ‘How does this all turn out?’ ” he said. “That’s a much more challenging question.”

So we confront a dilemma. What matters is the big question, but the big question can’t be scored. The little question doesn’t matter but it can be scored, so the IARPA tournament went with it. You could say we were so hell-bent on looking scientific that we counted what doesn’t count.

That is unfair. The questions in the tournament had been screened by experts to be both difficult and relevant to active problems on the desks of intelligence analysts. But it is fair to say these questions are more narrowly focused than the big questions we would all love to answer, like “How does this all turn out?” Do we really have to choose between posing big and important questions that can’t be scored or small and less important questions that can be? That’s unsatisfying. But there is a way out of the box.

Implicit within Paul Saffo’s “How does this all turn out?” question were the recent events that had worsened the conflict on the Korean peninsula. North Korea launched a rocket, in violation of a UN Security Council resolution. It conducted a new nuclear test. It renounced the 1953 armistice with South Korea. It launched a cyber attack on South Korea, severed the hotline between the two governments, and threatened a nuclear attack on the United States. Seen that way, it’s obvious that the big question is composed of many small questions. One is “Will North Korea test a rocket?” If it does, it will escalate the conflict a little. If it doesn’t, it could cool things down a little. That one tiny question doesn’t nail down the big question, but it does contribute a little insight. And if we ask many tiny-but-pertinent questions, we can close in on an answer for the big question. Will North Korea conduct another nuclear test? Will it rebuff diplomatic talks on its nuclear program? Will it fire artillery at South Korea? Will a North Korean ship fire on a South Korean ship? The answers are cumulative. The more yeses, the likelier the answer to the big question is “This is going to end badly.”

I call this Bayesian question clustering because of its family resemblance to the Bayesian updating discussed in chapter 7. Another way to think of it is to imagine a painter using the technique called pointillism. It consists of dabbing tiny dots on the canvas, nothing more. Each dot alone adds little. But as the dots collect, patterns emerge. With enough dots, an artist can produce anything from a vivid portrait to a sweeping landscape.

There were question clusters in the IARPA tournament, but they arose more as a consequence of events than a diagnostic strategy. In future research, I want to develop the concept and see how effectively we can answer unscorable “big questions” with clusters of little ones.

(Note that although I work as a GiveWell research analyst, my focus at GiveWell is not AI risks, and my views on this topic are not necessarily GiveWell’s views.)

Time to proof for well-specified problems

How much time usually elapses between when a technical problem is posed and when it is solved? How much effort is usually required? Which variables most predict how much time and effort will be required to solve a technical problem?

The main paper I’ve seen on this is Hisano & Sornette (2013). Their method was to start with Wikipedia’s List of conjectures and then track down the year each conjecture was first stated and the year it was solved (or, whether it remains unsolved). They were unable to determine exact-year values for 16 conjectures, leaving them with a dataset of 144 conjectures, of which 60 were solved as of January 2012, with 84 still unsolved. The time between first conjecture statement and first solution is called “time to proof.”

For the purposes of finding possible data-generating models that fit the data described above, they assume the average productivity per mathematician is constant throughout their career (they didn’t try to collect more specific data), and they assume the number of active mathematicians tracks with total human population — i.e., roughly exponential growth over the time period covered by these conjectures and proofs (because again, they didn’t try to collect more specific data).

I didn’t try to understand in detail how their model works or how reasonable it is, but as far as I understand it, here’s what they found:

  • Since 1850, the number of new conjectures (that ended up being listed on Wikipedia) has tripled every 55 years. This is close to the average growth rate of total human population over the same time period.
  • Given the incompleteness of the data and the (assumed) approximate exponential growth of the mathematician population, they can’t say anything confident about the data-generating model, and therefore basically fall back on Occam: “we could not reject the simplest model of an exponential rate of conjecture proof with a rate of 0.01/year for the dataset (translating into an average waiting time to proof of 100 years).”
  • They expect the Wikipedia dataset severely undersamples “the many conjectures whose time-to-proof is in the range of years to a few decades.”
  • They use their model to answer the question that prompted the paper, which was about the probability that “P vs. NP” will be solved by 2024. Their model says there’s a 41.3% chance of that, which intuitively seems high to me.
  • They make some obvious caveats to all this: (1) the content of the conjecture matters for how many mathematician-hours are devoted to solving it, and how quickly they are devoted; (2) to at least a small degree, the notion of “proof” has shifted over time, e.g. the first proof of the four-color theorem still has not been checked from start to finish by humans, and is mostly just assumed to be correct; (3) some famous conjectures might be undecidable, leaving some probability mass for time-to-proof at infinity.

What can we conclude from this?

Not much. Sometimes crisply-posed technical problems are solved quickly, sometimes they take many years or decades to solve, sometimes they take more than a century to solve, and sometimes they are never solved, even with substantial effort being targeted at the problem.

And unfortunately, it looks like we can’t say much more than that from this study alone. As they say, their observed distribution of time to proof must be considered with major caveats. Personally, I would emphasize the likely severe undersampling of conjectures with short times-to-proof, the fact that they didn’t try to weight data points by how important the conjectures were perceived to be or how many resources went into solving them (because doing so would be very hard!), and the fact that they didn’t have enough data points (especially given the non-stationary number of mathematicians) to confirm or reject ~any of the intuitively / a priori plausible data-generating models.

Are there other good articles  on “time to proof” or “time to solution” for relatively well-specified research problems, in mathematics or other fields? If you know of any, please let me know!

Reply to LeCun on AI safety

On Facebook, AI scientist Yann LeCun recently posted the following:

<not_being_really_serious>
I have said publicly on several occasions that the purported AI Apocalypse that some people seem to be worried about is extremely unlikely to happen, and if there were any risk of it happening, it wouldn’t be for another few decades in the future. Making robots that “take over the world”, Terminator style, even if we had the technology. would require a conjunction of many stupid engineering mistakes and ridiculously bad design, combined with zero regards for safety. Sort of like building a car, not just without safety belts, but also a 1000 HP engine that you can’t turn off and no brakes.

But since some people seem to be worried about it, here is an idea to reassure them: We are, even today, pretty good at building machines that have super-human intelligence for very narrow domains. You can buy a $30 toy that will beat you at chess. We have systems that can recognize obscure species of plants or breeds of dogs, systems that can answer Joepardy questions and play Go better than most humans, we can build systems that can recognize a face among millions, and your car will soon drive itself better than you can drive it. What we don’t know how to build is an artificial general intelligence (AGI). To take over the world, you would need an AGI that was specifically designed to be malevolent and unstoppable. In the unlikely event that someone builds such a malevolent AGI, what we merely need to do is build a “Narrow” AI (a specialized AI) whose only expertise and purpose is to destroy the nasty AGI. It will be much better at this than the AGI will be at defending itself against it, assuming they both have access to the same computational resources. The narrow AI will devote all its power to this one goal, while the evil AGI will have to spend some of its resources on taking over the world, or whatever it is that evil AGIs are supposed to do. Checkmate.
</not_being_really_serious>

Since LeCun has stated his skepticism about potential risks from advanced artificial intelligence in the past, I assume his “not being really serious” is meant to refer to his proposed narrow AI vs. AGI “solution,” not to his comments about risks from AGI. So, I’ll reply to his comments on risks from AGI and ignore his “not being really serious” comments about narrow AI vs. AGI.

First, LeCun says:

if there were any risk of [an “AI apocalypse”], it wouldn’t be for another few decades in the future

Yes, that’s probably right, and that’s what people like myself (former Executive Director of MIRI) and Nick Bostrom (author of Superintelligence, director of FHI) have been saying all along, as I explained here. But LeCun phrases this as though he’s disagreeing with someone.

Second, LeCun writes as though the thing people are concerned about is a malevolent AGI, even though I don’t know anyone is concerned about malevolent AI. The concern expressed in Superintelligence and elsewhere isn’t about AI malevolence, it’s about convergent instrumental goals that are incidentally harmful to human society. Or as AI scientist Stuart Russell put it:

A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable.  This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want. A highly capable decision maker – especially one connected through the Internet to all the world’s information and billions of screens and most of our infrastructure – can have an irreversible impact on humanity.

(Note that although I work as a GiveWell research analyst, my focus at GiveWell is not AI risks, and my views on this topic are not necessarily GiveWell’s views.)

What are the best wireless earbuds?

I listen to music >10 hrs per day, and I love the convenience of wireless earbuds. They are tiny and portable, and I can do all kinds of stuff — work on something with my hands, take on/off my jacket or my messenger bag, etc. — without getting tangled up in a cord.

So which wireless earbuds are the best? For this kind of thing I always turn first to The Wirecutter, which publishes detailed investigations of consumer products, like Consumer Reports but free and often more up-to-date.

I bought their recommended wireless earbuds a while back, when their recommendation was the Jaybird Bluebuds X. After several months I lost that pair and bought the new Wirecutter recommendation, the JLab Epic Bluetooth. Those were terrible so I returned them and bought the now-available Jaybird X2, which has been awesome so far.

So long as a pair of wireless earbuds have decent sound quality and >6 hrs battery life, the most important thing to me is low frequency of audio cutting.

See, Bluetooth is a very weak kind of signal. It can’t really pass through your body, for example. That’s why it uses so little battery power, which is important for tiny things like wireless earbuds. As a result, I got fairly frequent audio cutting when trying to play music from my phone in my pants pocket to my Jaybird Bluebuds X. After some experimentation, I learned that audio cutting was less frequent if my phone was in my rear pocket, on the same side of my body as the earbuds’ Bluetooth receiver. But it still cut out maybe an average of 200 times an hour (mostly concentrated in particularly frustrating 10-minute periods with lots of cutting).

When I lost that pair and got the JLab Epic Bluetooth, I hoped that with the newer pair they’d have figured out some extra tricks to reduce audio cutting. Instead, the audio cutting was terrible. Even with my phone in the optimal pants pocket, there was usually near-constant audio cutting, maybe about 2000 times an hour on average. Moreover, when I used them while reclining in bed, I would get lots of audio cutting whenever my neck was pressed up against my pillow! So, pretty useless. I returned them to Amazon for a refund.

I replaced this pair with The Wirecutter’s 2nd choice, the Jaybird X2. So far these have been fantastic. In my first ~15 hours of using them I’ve gotten exactly two split-second audio cuts.

So if you want to make the leap to wireless earbuds, I recommend the Jaybird X2. Though if you don’t mind waiting, the Jaybird X3 and Jaybird Freedom are both coming out this spring, and they might be even better.

One final note: I got my last two pairs of wireless earbuds in white so that others can see I’m wearing them. With my original black Bluebuds X, people would sometimes talk at me for >30 seconds without realizing I couldn’t hear them because I had music in my ears.

MarginNote: the only iPhone app that lets you annotate both PDFs and epub files

As far as I can tell, MarginNote is the only iPhone app that lets you annotate & highlight both PDFs and epub files, and sync those annotations to your computer. And by “PDFs and epub files” I basically mean “all text files,” since Calibre and other apps can convert any text file into an epub, except for PDFs with tables and images. (The Kindle iPhone app can annotate text files, but can’t sync those annotations anywhere unless you bought the text directly from Amazon.)

This is important for people who like to read nonfiction “on the go,” like me — and plausibly some of my readers, so I figured I’d share my discovery.