Seemingly Popular Covid-19 Model is Obvious Nonsense

Posted on April 11, 2020 by TheZvi

Previous Covid-19 thoughts: On R0, Taking Initial Viral Load Seriously

Epistemic Status: Something Is Wrong On The Internet. Which should almost always be ignored even when you are an expert, and I am nothing of the kind. However, this was a necessary exception. My expectation that I would regret writing this has proven incorrect.

People are taking the projection of 60,000 American deaths from Covid-19 as if it were a real prediction. This number is being used to make policy, to deny states medical equipment and to make plans that spend trillions of dollars and when to plan to reopen entire economies.

Ignoring this in the hopes it will go away does not seem reasonable.

My suspicions that this was necessary were more than confirmed when, failing to realize just how obvious the nonsense in question was and thinking I needed to justify labeling it nonsense, I wrote a reference post called The One Mistake Rule.

The second comment on that post was to argue that we should indeed use exactly the model that motivated me to write the post. The comment is here in full:

>> If a model gives a definitely wrong answer anywhere, it is useless everywhere.

Except if it needs to be used right now to make important decisions and it’s the best model we have. See: https://covid19.healthdata.org/united-states-of-america

We could plausibly think this is the best model we have? Oh my are we screwed.

The Baseline Scenario That Makes No Sense

There seems to be a developing consensus on many fronts, for now, that the model linked above represents our reality. The model says it is ‘designed to be a planning tool’ and that is exactly what is happening here.

What is this model doing? Time to look at the pdf.

Here’s the money quote that describes the core of what they are actually doing.

A covariate of days with expected exponential growth in the cumulative death rate was created using information on the number of days after the death rate exceeded 0.31 per million to the day when 4 different social distancing measures were mandated by local and national government:

School closures, non-essential business closures including bars and restaurants, stay-at-home recommendations, and travel restrictions including public transport closures. Days with 1 measure were counted as 0.67 equivalents, days with 2 measures as 0.334 equivalents and with 3 or 4 measures as 0. For states that have not yet implemented all of the closure measures, we assumed that the remaining measures will be put in place within 1 week. This lag between reaching a threshold death rate and implementing more aggressive social distancing was combined with the observed period of exponential growth in the cumulative death rate seen in Wuhan after Level 4 social distancing was implemented, adjusted for the median time from incidence to death. For ease of interpretation of statistical coefficients, this covariate was normalized so the value for Wuhan was 1.

In other words, this model assumes that social distancing measures work really, really well. Absurdly well. All you have to do to stop Covid-19 is any three of: Close schools, close non-essential businesses, tell people to stay at home, impose travel restrictions.

If you do that and maintain it, people stop dying. Entirely.

Look at the graph they have up as of this writing (updated on 4/10). By June 20, they predict actual zero deaths that day and every future day. They have us under 100 deaths per day by the end of May.

The peak in hospital use? Today, April 11.

The peak in deaths? Yesterday, April 10. For New York, several days ago, with our last death on May 20.

In other words, considering the delay in deaths is about three weeks, they predict that no one in New York State will be infected after April. No one! We’ll all be safe in only three weeks!

This is despite us not yet seeing any evidence of a major decline in positive test rates in New York. Deaths lag positive tests by weeks.

Hard to be more maximally optimistic than that. One could call this the ‘theoretical beyond best case scenario.’

(The statement is actually even more absurd than that, considering variation in time to case progression, but I’m going to let that one go.)

(Exercise for the reader, you have five seconds: What is the implied R0?)

(Second exercise for the reader: If there are four things that reduce the spread of infection some amount, and R0 is about 4 initially, and you implement three of them, what is the new R0?)

They Account for Uncertainty, Right?

They generously account for uncertainty with the following ‘confidence interval’:

Figure 9 shows the expected cumulative death numbers with 95% uncertainty intervals. The average forecast suggests 81,114 deaths, but the range is large, from 38,242 to 162,106 deaths.

(Note: this was as of paper publishing, numbers are now lower.)

That is not how this works. That is not how any of this works.

The way this works once we correct for all the obvious absurdities is that this is a lower bound on how good things could possibly go.

If I am incorrect, and that is how any of this works I have some very, very large bets I would like to place.

A Simpler Version of the Same Model

The model seems functionally the same as this:

Assume all reported numbers are accurate, and assume that no one gets infected once you nominally implement three of the four social distancing measures. Which you assume every US state will do within a week from the model starting.

Let’s simplify that again.

Assume that no one under an even half-serious (three quarters serious?) lock down ever gets infected out-of-household.

We still see deaths for a few weeks, because there is a lag, but then it’s all over.

What the Model Outputs

As of when I wrote this line, this more-than-maximally-optimistic model projects 61,545 deaths in the United States.

People with power, people with influence, what some might call our “best people,” are on television and in the media predicting around 60,000 total American deaths.

I will say that again.

We are telling the public a death count that effectively implies that by about a month from now, and in many places earlier than that, no new American ever gets infected with Covid-19.

The model assumes that our half measures towards social distancing will have the same impact as was reported in Wuhan. In Wuhan, they blockaded apartment buildings, took anyone suspected of being positive away for isolation, and still, months after this model says there are no infections or even deaths, have severe movement restrictions and blockades up all over the place.

Whereas the New York City subways continue to run, and California thinks weed sales are an essential business.

I hope that my perception of this is wrong. Perhaps everyone knows this model is nonsense. Perhaps there are better ones out there – if you know of one you respect, please let me know about it!

But again, this is a maximally optimistic model on every front. I keep seeing people whose voice matters share this same final answer of predicting 60,000 deaths. If it’s not from a model doing more or less this, I don’t know how you get an answer in that ballpark.

Unless of course answers are being chosen without regard to reality.

This entry was posted in Coronavirus, Death by Metrics. Bookmark the permalink.

17 Responses to Seemingly Popular Covid-19 Model is Obvious Nonsense

ray says:

April 11, 2020 at 11:34 pm

Do you have a prediction? I ballparked 300k a while back, but I didn’t put too much effort into it and I haven’t updated it in a while. 60k is clearly absurd.

Reply
- TheZvi says:
  
  April 16, 2020 at 3:44 pm
  
  Sorry about the delay. As I said to Andrew I think median 200k is reasonable right now, given the very good news in the past 1-2 weeks. But mean is much, much higher, full disaster scenarios of millions dead are still in play (whereas 60k seems like an actual impossible outcome).
  
  Reply
  - Quixote says:
    
    April 17, 2020 at 4:24 pm
    
    For posterity, my own best case number from around 3 weeks ago was 200,000 deaths (in the US this year).
    
    Reply
Andrew Hunter says:

April 12, 2020 at 2:15 am

I would be interested to see you make a market here, though I admit the optics aren’t great outside our circles.

Reply
- TheZvi says:
  
  April 16, 2020 at 3:43 pm
  
  Official death numbers are missing a lot (see NYC’s adjustment) so it’s tough to get a well-defined market. My guess is the median at this point would be something like 200k, with the mean much higher than that, using a metric similar to current reporting.
  
  Reply
Liam Rosen (@Skyd_LiamRosen) says:

April 12, 2020 at 2:37 am

In this post and your last one, you criticize the IHME model as “nonsense” and “useless” (which are clearly exaggerations — unless you actually think that the model is just as useless as a random number generator), but you provide no alternate model that more accurately represents reality.

A black and white rejection of a model is just as “useless” as a black and white acceptance of it. I’ve had this same argument with three other people on TheMotte subreddit.

Let’s see if we can agree on three points::
1. We need to use some kind of model to make decisions.
2. All models have issues.
3. If we agree on 1, and you want to argue against a model that MUST be used to make decisions, you must provide something better. Otherwise, we should simply agree that the IHME model has issues and provide constructive criticism for the team to update it, because having a reasonably accurate model is better than flying blind.

Secondly, the model has been updated since the Medrxiv paper was published. They now take data from Spain and Italy into account: http://www.healthdata.org/sites/default/files/files/Projects/COVID/Estimation_update_040520_3.pdf. Both implemented less strict distancing policies than Wuhan (though I will give that they were both more strict than what’s happening in most US states).

They also hope to include actual mobility data to measure adherence to distancing: https://i.imgur.com/2u4zeDO.png (http://www.healthdata.org/covid/updates)

Reply
SYaba says:

April 12, 2020 at 4:12 am

Have you seen Nate Silver’s article on the complexities of building a model with the data that’s available? https://fivethirtyeight.com/features/coronavirus-case-counts-are-meaningless/

Seemed thorough. Also provides an Excel sheet one can tinker with at the end of the article.

Reply
ADifferentAnonymous says:

April 16, 2020 at 3:23 pm

It seems the nonsense comes from a game of telephone. The PDF’s intended takeaway seems to be “Even with maximally optimistic assumptions everywhere possible, we’re going to need a lot more ICU beds.” Its authors would probably have no objection to the characterization ‘theoretical beyond best case scenario.’

The website then lists the model’s conclusions with the heading “COVID-19 projections assuming full social distancing through May 2020”. In a different context, ‘full social distancing’ could mean ‘theoretical perfect social distancing’, which would be an accurate statement of the model’s assumptions, but anybody encountering that page will assume it means lockdowns as currently implemented.

And then people are repeating that as an actual prediction despite the assumption being clearly false under any interpretation. Yikes.

Reply
- TheZvi says:
  
  April 16, 2020 at 3:41 pm
  
  That seems approximately right. I can totally see the authors having plausibly good intentions at first – but that’s also a good lesson in why it’s a bad idea to do beyond-best-case with confidence intervals and looking like predictions!
  
  Reply
Pingback: My Covid-19 Thinking: 4/17 | Don't Worry About the Vase
nostalgebraist says:

April 20, 2020 at 6:54 am

This isn’t what they’re doing. The paragraph you excerpted is very badly written, and I originally took away the same impression from it, but it actually means something else. (If it were true, their model output would not like exactly Gaussian on the state level, which it does.)

They write out their model specification here:

https://ihmeuw-msca.github.io/CurveFit/methods/

The social distancing “covariate,” denoted S_j there, has a linear effect on the horizontal shift of the Gaussian. It’s not their time variable, t, it’s another input variable that varies by state but not over time. The numbers for 2 measures, 3 measures, etc are describing how they compute this number for each state, by integrating w/r/t time a function that’s decreasing in the number of measures at t.

(The model is bad in many other ways, as I think we agree.)

Reply
Pingback: My Covid-19 Thinking: 4/23 pre-Cuomo Data | Don't Worry About the Vase
Dan says:

April 25, 2020 at 2:41 pm

Please check on this commentary on IHME and a fermentation microbiologist’s growth rate based model.

https://beacomconsulting.com/covid-blog/blog-test-post#comments

Reply
Pingback: On “COVID-19 Superspreader Events in 28 Countries: Critical Patterns and Lessons” | Don't Worry About the Vase
Pingback: Covid 10/8: October Surprise | Don't Worry About the Vase
Pingback: Book Review: Talent | Don't Worry About the Vase
Pingback: 1 – Do the Right Thing - Traffic Ventures

	Savio Mak on Choices Are Really Bad
	huyu on GPT-4o My and Google I/O …
	Duzler on Monthly Roundup #18: May …
	magic9mushroom on OpenAI: Exodus
	huyu on GPT-4o My and Google I/O …

Seemingly Popular Covid-19 Model is Obvious Nonsense