## Judging Our April 2020 Covid-19 Predictions

This seems like an excellent opportunity to reflect on those predictions. I’ll also attempt to render my verdict on the predictions, based on the principles I discuss in Evaluating Predictions in Hindsight under Hard Mode, as Scott’s already done the Easy Mode work.

Afterwards, I’ll give my take on the Assorted Links from that post as well.

## Note on Methodology

This post is where I gave my predictions. Note that I don’t give my fair probabilities here. Instead, I give what prices I’d be willing to wager at. This is an important distinction.

The coward’s method of proposing a wager is to say ‘Bob, you think this is 90% likely, I think that’s too high, so surely you will give me 9:1 odds.’ That’s nonsense, of course, because Bob previously thought those odds were fair, and now someone wants to bet against him. Why should Bob let someone pick which predictions of his someone bets against at fair odds?

The epistemic hero’s method would be to give your own fair probability, and offer to meet in the middle via the Green Knight test (you’d do the math and combine into one wager). Thus, ‘Bob, you think this is 90% likely, I think it is 10% likely, so let’s bet at even odds.’

The gambler’s method is to consider the prediction as if it were an initial value for a prediction market, and reveal how far you’d be willing to move the odds while still being comfortable wagering. That doesn’t mean you would then think the odds were fair, or that you’d be comfortable if someone wanted to bet against you on your own side at those new odds. Thus, you say something like “Bob, you say 90%, I’m willing to bet against that at 80% odds” which means you think it’s at most 80%, so you’re willing to place bets first at 90%, then 85%, then 80%, so long as cumulative size isn’t too large, but then you’ll stop and won’t place the one at 75%. If Alice comes along and says “You stopped at 80% odds, I want to bet against this at 75% odds” you might or might not accept her wager. Often you wouldn’t.

It’s not a first-best solution to (at least sort of) keep two sets of books in your head, where the odds on the game say 50%, you’re willing to bet up to 55%, but your gut tells you your side will win 70% of the time. It is still often a very good practical solution to keep both sets of books, and mostly avoid placing wagers inside the 55%-70% window. One way to think of that is you’d be at 70% as fair value if you hadn’t seen the market, but you respect the market and know you’re often wrong. But if the market wanted to move higher, you’d offer no objections.

In this case, looking at the gambler’s limit prices improves my calibration score, because I was overconfident in the overall arc of the pandemic and our reactions to it, which is why my log score was highly unimpressive. In Hard Mode, we consider everything including reasoning, so such calculations are mostly set aside either way.

I encourage you to follow along by reading my previous thoughts in the original post as you go down the list of questions, to get the full explanations, as for space reasons I’m cutting them short here.

## Questions

False. The biggest mistake I made in this pandemic was vastly underestimating the strength of the control system. My explanation on this question makes that very clear. I thought it wasn’t likely that people would be willing to continue containment that didn’t make progress even until June, and I was very wrong. Should have been higher than 60%. I consider this prediction a rather large error.

False on Scott’s grading. They did relax things somewhat in between, so I’d view this as ambiguous. Under my read at the time I think there was a major (but far from total) relaxation that was then rescinded. Despite this, 10% was a really bad prediction here no matter the interpretation, there clearly was a plausible path for fully sustained lockdowns thanks to the control systems and the way California was choosing its risk tolerance levels. This needed to be at least 25%.

False. This one comes out looking correct. By the time the prediction was made this was sub-1%.

False. This was closer than I would have expected, seasonality helped a lot and I underestimated it. My guess is the right guess was a little higher than this, something like 35% or 40%.

True. I essentially say ‘I think this is higher than 90% but for various reasons it’s tough to bet things up higher than 90%.’ In hindsight this is so high a threshold it’s essentially a Can’t Happen, but given information at the time I don’t think 90% is crazy low – it’s plausible 95% is a better prediction but I don’t think more than that would have been wise.

True. I did win this one but again feels overconfident given what we knew at the time, and I bet this one up a bit too high, although China’s presumed willingness to fudge numbers matters. 85% would have been a better stopping point.

True. On reflection, giving a full 10% to ‘China has the worst death toll but officially denies it’ seems crazy high, and I’m fine with this one at 80% with only a 5% difference between the two numbers. I didn’t think hard enough about that question at the time.

True, I got this overturned ‘on appeal’ via a Twitter poll. My logic at the time goes on to explicitly say that I expect this to remain the narrative even if it’s no longer true, rather than being a prediction that almost never does another city end up getting hit harder. On reflection, indeed do many things come to pass, narratives shift in strange ways and 90% was the better prediction.

False. This was either damn close (95,963!) or not close at all because China cares about round numbers and was only going to let it cross 100k if they had no choice, and they had plenty of choice. I was surprised to see China’s case count had risen this much, and it’s kind of suspiciously close to 100k given that a lot of people have incentives not to report cases and China doesn’t want to cross round thresholds. I’m going to say that 40% was a pretty good prediction.

True! Vaccine timelines being so fast was a huge outlier. We succeeded here, but only by a few weeks, despite things going mind-blowingly right and seeing much better results than anyone expected. AstraZeneca messed up their trials but that was the only major thing that went wrong, and it’s hard to see where things could have gone much better short of some small country going rogue. I’m not sure whether 40% or 50% was the better prediction here, but that seems like a good range.

11. Best scientific consensus ends up being that hydroxychloroquine was significantly effective: 20%

Sell to 15% or so, while noting that I think the chance of it actually being effective is much higher than that.

False. I like this assessment in hindsight. We now more fully understand the extent to which The Narrative pushes hard on things like this, and would have made it exceedingly difficult for HCQ to be accepted as effective even if it was effective. It would have needed to be a game changer.

12. I personally will get coronavirus (as per my best guess if I had it; positive test not needed): 30%

Sell to 20% at least, and also what the hell?

False. Yeah, I didn’t sell enough of this, no way he was at 20% risk to catch this and his prediction here seems even siller now.

13. Someone I am close to (housemate or close family member) will get coronavirus: 60%

Sell to 40%.

False. This seems like a more reasonable place to have stopped than the 20% from #12. I want to say still a bit high, but room here for some people who might not take good precautions. Note that the 2:1 ratio here between these two questions is definitely silly and selling them both down a third indicates a mistake somewhere.

14. General consensus is that we (April 2020 US) were overreacting: 50%

15. General consensus is that we (April 2020 US) were underreacting: 20%

“General consensus will be that we were reacting stupidly. We reacted wrong. That’s an easy call. The question is, will that be widely seen as an underreaction, an overreaction, something that’s neither, or will there be a lack of consensus? What does it take to get a ‘consensus’? Who counts?

My guess is that there flat out won’t be consensus.

…so I’m going to sell the overreacting contract down to 30%, but stop there because people are bad at such things and find ways to rewrite history to suit their narratives. I’m going to hold the 20% on underreacting”

False on both, as expected there’s no consensus apart from the consensus that we were acting stupidly. Which is hard to avoid as a conclusion, given we did contradictory things in different places and at different times, and also dropped some rather important balls. But even then I’m not super confident you’d call it a ‘consensus.’ By only going down to 30% and 20%, I seem to be implying I’m not that confident in a lack of consensus, so this seems like I should have been somewhat bolder.

16. General consensus is that summer made coronavirus significantly less dangerous: 70%

True. Again consensus can be tough, so 70% seems reasonable or even a bit high. I don’t think we could be super confident in this at the time.

17. …and there is a catastrophic (50K+ US deaths, or more major lockdowns, after at least a month without these things) second wave in autumn: 30%

True. I stand by this being a substantial underdog for exactly the reason I noted, which is that this is a parlay of several things that each must happen – we need to go under the bar, then back over the bar, on multiple fronts. There were Major Lockdowns each month, but it’s not clear there were ‘more major lockdowns’ each month, so I do think technically this evaluates to true as written, but as far as intent it’s not clear to me this fully happened because it’s not clear lockdowns sufficiently lifted. Parlays are hard to win!

18. I personally am back to working not-at-home: 90%

False. Seeing this as different from the lockdown percentage seems clearly right in hindsight, and the reason I’m still too high on this is that I got the lockdown probability wrong. If we compound that 25%+ with another 10% here we get that this can be at most around 65%, and also other things can always happen so 60% seems like a more reasonable maximum.

19. At least half of states send every voter a mail-in ballot in 2020 presidential election: 20%

False. I don’t think this was ever close to happening, exactly for the reasons I laid out. Not moving markets too far when you’re anchored, especially betting on a big favorite, is always good policy.

20. PredictIt is uncertain (less than 95% sure) who won the presidential election for more than 24 hours after Election Day. 20%

True. Thinking about all the right things here, still getting to the wrong answer. Seeing accusations of fraud as unlikely looks silly in hindsight, especially given the extended willingness of the market to stay insane long after the verdict was in. You can say it should have been under 95% for 24 hours, but you can’t say that for a month out. So that reasoning wasn’t great. The question is how close the election had to actually be in terms of tipping point margin of victory, in order to allow it to be thrown into question, in each direction, and thus how likely such a result was. The tipping point state was Biden+0.6%. My guess is that Biden+3% or Trump+1% would have settled things quickly. That range seems more like a 30% shot than a 10% or 20% shot, but also we got a very large amount of fraud accusation versus reasonable priors in April even given that we should have expected a lot more fraud accusations than people did expect. I think 20%-25% seems like the right range in hindsight.

I also looked at Scott’s non-Covid predictions, but we’ll postpone looking at those until Scott grades them.

1. Hypermind looks promising at first glance.
1. Vitalik Buterin talks about his adventures winning \$50,000 betting against Trump on Ethereum prediction market Augur.

My takeaway from Vitalik’s journey is that it took \$50,000 worth of time and technical expertise to make that \$50,000, and the only reason it made sense for Vitalik to do it was because of the value of reporting on the results and in learning by doing. Essentially this is a start-up style operation to see if things can scale, and even with the completely insane market and uniquely huge event the opportunity size wasn’t great. Perhaps in 2024 such things will be ready for prime time but for now I would treat the operational risks involved as bigger than the profit margins offered.

What’s most weird to me is that the various prediction markets stayed in line with each other, despite very different participation restrictions and costs of doing arbitrage. My guess is that a lot of people involved were not thinking about real costs and effective odds, rather thinking about whether the market prices lined up and were ‘fair’ in some sense.

1. This week on Metaculus: will a third-party candidate win 5%+ of the popular vote in 2024? Users say 15% chance

Scott is betting against. I’m not. Not only Perot in ‘92 and ‘96 but Anderson in ‘80 broke this barrier for 3 of the last 11, there are plausible known routes to this in ‘24 (e.g. Trump as 3rd party, Trump (or someone in his image) as Republican causing a third party run, Libertarians running Amash or Romney, or a proper run somehow by Kanye West or another billionaire, or even a Warren/Sanders style scorched earth campaign on the left if Biden runs again). Hell, the way things are weirding who knows who will run. If anything I’d be at 20% rather than 15%.

1. Also, will Bitcoin outperform the US stock market over the next five years, at 51%. I started out thinking – of course it’s 50-50! By the efficient market hypothesis, if any asset was obviously going to do better than another, people would change the price until it wasn’t. But on second thought that’s wrong – stocks have a higher than 50% chance of beating treasuries over the same period because of a risk premium. Maybe there’s no intuitive way to think about this, you have to have opinions on the underlying fundamentals, and it’s only 51% by coincidence?

It’s at exactly 50/50 now! I quoted Scott in full above to ensure I fully represent his thinking here. Looking at this market tricked me into trying to put in a prediction, despite that then putting a burden on me (in theory anyway) to update that prediction continuously or lose points, but it said I was Forbidden to do that, so I didn’t.

This does not reflect well on Metaculus. The 50% number is crazy, or at a minimum, it represents a very strong rejection of the Efficient Market Hypothesis, and would make Bitcoin what is known as a Screaming Buy.

This comes up every time I see Bitcoin price distribution predictions. Bitcoin can only go as low as \$0. Bitcoin could, in theory, go up not only to \$100k but to \$1 million or more.

If the Bitcoin distribution centers in the same place as the stock market, it is a screaming buy compared to the stock market, and you should put a substantial portion of your net worth into BTC.

If Bitcoin is priced efficiently now, then that implies it is more likely to fall than rise, even with a substantial risk premium, because that’s the only way for the math to come out even. The alternative is to both think BTC is priced fairly and that almost none of that value is the potential to rocket to the moon, which doesn’t seem right to me at all. If you think BTC can’t rocket, BTC is a bad buy.

Perhaps Metaculus thinks Bitcoin is indeed a screaming buy. That’s not a crazy thing to think. But if it does think that, it seems like an awfully big coincidence that this landed on exactly 50%.

There’s no trade, since (as many people reminded me) Metaculus is not a prediction market and you can’t trade on its values, but there’s still a big contradiction with market prices here.

None of this is investment advice in any way, but: My model of BTC at the moment is that expected returns for holding BTC are positive, in excess of its fair risk premium, but that in a large majority of worlds it will be outperformed by the stock market, often that involves very large declines, and also you have to account for tax liability and the chance someone will steal your bitcoins.

The weekly Covid update will be posted on Thursday as per usual.

This entry was posted in Uncategorized. Bookmark the permalink.

### 7 Responses to Judging Our April 2020 Covid-19 Predictions

1. Andrew Clough says:

About Hydroxychloroquine, did you see the paper that went by suggesting that it could be effective if combined with a TMPRSS2 inhibitor?

https://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1009212

The folks on TWiV were doubtful we’ll get a full RCT for that combo at this late date but it was interesting at least.

• TheZvi says:

I hadn’t. As I said, I was far more bullish on ‘it works if done right’ than ‘we’ll say that it works’

2. Tom C says:

In market lingo, Bob’s 9:1 assessment of the odds, if he’s willing to trade either side of that, is a “locked market”. No one in their right mind would make a locked market on any bet.
We used to deal with that problem with the following two rules: 1) Both parties agree that a trade must happen, and 2) the game proceeds iteratively, with the respondent having to either accept a bet or make a narrower market.
For example, I might say to you “I make it 600,000–900,000 total number of US COVID deaths by the end of 2021”. You then must either bet the under (900,000), or make me a narrower market, that is something less than 300,000 wide. You might respond “600,000-700,000”, in which case I must bet the 600,000 under or the 700,000 over. This proceeds until there’s finally a trade.

• Tom C says:

Typo: the under is 600,000, not 900,000!

• TheZvi says:

Yep, classic way to do it, good training for traders, but also skill intensive favoring traders.

3. myst_05 says:

Regarding the “highest death toll” question, Russia came awfully close with 358k excess deaths: https://meduza.io/feature/2021/02/10/teper-pochti-ofitsialno-rossiya-na-pervom-meste-po-chislu-zhertv-koronavirusa-na-dushu-naseleniya-eto-rezultat-deystviy-vlastey-letom-2020-goda. India’s excess deaths seem to be low, but I don’t know how accurate their data tracking is on the ground, so the US indeed seems to have the highest absolute number of casualties.

4. clgn says:

Not sure if #17 should have been an underdog, with spring 2020 knowledge, since we knew that two important things were cyclic:

1. The weather. We had already seen the pattern with historical respiratory viruses. Further, especially in northern climates, the seasons are one of the biggest changes in human activity and the environment, in general (even in highly coordinated activities, like war). I’d have given at least a 30% chance of the seasonal cycle making the pandemic significantly cyclic as well.

2. The control system. This was back in Spring, when the sheer power of the control system hadn’t yet shown itself. But we still knew abstractly about its existence, and had seen it in other forms. I’d have given this a 10% chance.

Even if none of these by themselves would have been capable of making the pandemic significantly cyclic, I’d have put another 5% on a combination of factors aligning to form a big cycle (including things like mutation). It’d only have to coincide for a month, after all.

Finally, basically every major historical plague has been cyclic. (And we can guess at reasons why — eg, power-law infections plus linear suppression — but does it really matter?) This would make me put a 25% tax on top of my model: I don’t want to bet against such a consistent pattern for an otherwise complex system. So ultimately I’d have been willing to buy up to 70% with spring 2020 knowledge. (Most of the remaining 30% would be odds of a quick cure/suppression of the disease.)

50k deaths is a very low bar, like you mentioned. It’s only 0.015% of Americans. The flu kills that many in a year. It wouldn’t have taken very much at all for us to have crossed it in a season.

Likewise, strict lockdowns were clearly not sustainable from an economic perspective. It would be another low bar for them to have been loosened at some point. Notably, loosening lockdowns also sows the seeds for future outbreaks, so 50k+ deaths and loosened lockdowns probably highly correlate.

(Disclaimer: I never made an explicit, numerical bet on this back in spring 2020.)