Unknown Knowns

Previously (Marginal Revolution): Gambling Can Save Science

A study was done to attempt to replicate 21 studies published in Science and Nature. 

Beforehand, prediction markets were used to see which studies would be predicted to replicate with what probability. The results were as follows (from the original paper):

Fig. 4: Prediction market and survey beliefs.

Fig. 4

The prediction market beliefs and the survey beliefs of replicating (from treatment 2 for measuring beliefs; see the Supplementary Methods for details and Supplementary Fig. 6 for the results from treatment 1) are shown. The replication studies are ranked in terms of prediction market beliefs on the y axis, with replication studies more likely to replicate than not to the right of the dashed line. The mean prediction market belief of replication is 63.4% (range: 23.1–95.5%, 95% CI = 53.7–73.0%) and the mean survey belief is 60.6% (range: 27.8–81.5%, 95% CI = 53.0–68.2%). This is similar to the actual replication rate of 61.9%. The prediction market beliefs and survey beliefs are highly correlated, but imprecisely estimated (Spearman correlation coefficient: 0.845, 95% CI = 0.652–0.936, P < 0.001, n = 21). Both the prediction market beliefs (Spearman correlation coefficient: 0.842, 95% CI = 0.645–0.934, P < 0.001, n = 21) and the survey beliefs (Spearman correlation coefficient: 0.761, 95% CI = 0.491–0.898, P < 0.001, n = 21) are also highly correlated with a successful replication.

 

That is not only a super impressive result. That result is suspiciously amazingly great.

The mean prediction market belief of replication is 63.4%, the survey mean was 60.6% and the final result was 61.9%. That’s impressive all around.

What’s far more striking is that they knew exactly which studies would replicate. Every study that would replicate traded at a higher probability of success than every study that would fail to replicate.

Combining that with an almost exactly correct mean success rate, we have a stunning display of knowledge and of under-confidence.

Then we combine that with this fact from the paper:

Second, among the unsuccessful replications, there was essentially no evidence for the original finding. The average relative effect size was very close to zero for the eight findings that failed to replicate according to the statistical significance criterion.

That means there was a clean cut. Thirteen of the studies successfully replicated. Eight of them not only didn’t replicate, but showed very close to no effect.

Now combine these facts: The rate of replication was estimated correctly. The studies were exactly correctly sorted by whether they would replicate. None of the studies that failed to replicate came close to replicating, so there was a ‘clean cut’ in the underlying scientific reality. Some of the studies found real results. All others were either fraud, p-hacking or the light p-hacking of a bad hypothesis and small sample size. No in between.

The implementation of the prediction market used a market maker who began anchored to a 50% probability of replication. This, and the fact that participants had limited tokens with which to trade (and thus, had to prioritize which probabilities to move) explains some of the under-confidence in the individual results. The rest seems to be legitimate under-confidence.

What we have here is an example of that elusive object, the unknown known: Things we don’t know that we know. This completes Rumsfeld’s 2×2. We pretend that we don’t have enough information to know which studies represent real results and which ones don’t. We are modest. We don’t fully update on information that doesn’t conform properly to the formal rules of inference, or the norms of scientific debate. We don’t dare make the claim that we know, even to ourselves.

And yet, we know.

What else do we know?

 

 

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

10 Responses to Unknown Knowns

  1. PDV says:

    For completeness, it bears remembering that this result might itself fail to replicate.

    • Benquo says:

      Might, sure. But it sure seems like the sort of thing that *would* replicate, not the sort of thing that *wouldn’t*, and it’s silly to pretend modesty about that sort of thing.

    • I debated putting a line to note that it is possible the results were fraudulent, although I find that unlikely. If they’re not fraudulent or the result of gross incompetence, they will replicate.

  2. romeostevens says:

    Counter to the pervasive nihilism frame, we do in fact know how to do science. Huge if true. What are the betting odds on this replicating?

  3. Noname says:

    What is most interesting is that these studies were published in the most prestigious scientific publications. If we know which science is good and which one bad, why Science and Nature don’t use this knowledge to filter what they publish?

  4. sniffnoy says:

    And apparently you don’t even need a prediction market for this, the survey did well enough on its own? That’s certainly interesting…

    • Indeed, the survey was slightly worse and made some ordering mistakes but did *almost* as good a job without the need for a market. And the market they used was crippled in ways that made it have more of the mistakes of a survey, as noted.

      The key is just getting people to give it their best guess?

  5. Pingback: Rational Feed – deluks917

  6. Benquo says:

    80,000 Hours now has a related quiz where you can guess which studies replicated: https://80000hours.org/psychology-replication-quiz/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s