How much time usually elapses between when a technical problem is posed and when it is solved? How much effort is usually required? Which variables most predict how much time and effort will be required to solve a technical problem?

The main paper I’ve seen on this is Hisano & Sornette (2013). Their method was to start with Wikipedia’s List of conjectures and then track down the year each conjecture was first stated and the year it was solved (or, whether it remains unsolved). They were unable to determine exact-year values for 16 conjectures, leaving them with a dataset of 144 conjectures, of which 60 were solved as of January 2012, with 84 still unsolved. The time between first conjecture statement and first solution is called “time to proof.”

For the purposes of finding possible data-generating models that fit the data described above, they assume the average productivity per mathematician is constant throughout their career (they didn’t try to collect more specific data), and they assume the number of active mathematicians tracks with total human population — i.e., roughly exponential growth over the time period covered by these conjectures and proofs (because again, they didn’t try to collect more specific data).

I didn’t try to understand in detail how their model works or how reasonable it is, but as far as I understand it, here’s what they found:

- Since 1850, the number of new conjectures (that ended up being listed on Wikipedia) has tripled every 55 years. This is close to the average growth rate of total human population over the same time period.
- Given the incompleteness of the data and the (assumed) approximate exponential growth of the mathematician population, they can’t say anything confident about the data-generating model, and therefore basically fall back on Occam: “we could not reject the simplest model of an exponential rate of conjecture proof with a rate of 0.01/year for the dataset (translating into an average waiting time to proof of 100 years).”
- They expect the Wikipedia dataset severely undersamples “the many conjectures whose time-to-proof is in the range of years to a few decades.”
- They use their model to answer the question that prompted the paper, which was about the probability that “P vs. NP” will be solved by 2024. Their model says there’s a 41.3% chance of that, which intuitively seems high to me.
- They make some obvious caveats to all this: (1) the content of the conjecture matters for how many mathematician-hours are devoted to solving it, and how quickly they are devoted; (2) to at least a small degree, the notion of “proof” has shifted over time, e.g. the first proof of the four-color theorem still has not been checked from start to finish by humans, and is mostly just assumed to be correct; (3) some famous conjectures might be undecidable, leaving some probability mass for time-to-proof at infinity.

What can we conclude from this?

Not much. Sometimes crisply-posed technical problems are solved quickly, sometimes they take many years or decades to solve, sometimes they take more than a century to solve, and sometimes they are never solved, even with substantial effort being targeted at the problem.

And unfortunately, it looks like we can’t say much more than that from this study alone. As they say, their observed distribution of time to proof must be considered with major caveats. Personally, I would emphasize the likely severe undersampling of conjectures with short times-to-proof, the fact that they didn’t try to weight data points by how important the conjectures were perceived to be or how many resources went into solving them (because doing so would be very hard!), and the fact that they didn’t have enough data points (especially given the non-stationary number of mathematicians) to confirm or reject ~any of the intuitively / *a priori* plausible data-generating models.

Are there other good articles on “time to proof” or “time to solution” for relatively well-specified research problems, in mathematics or other fields? If you know of any, please let me know!