There’s some uproar this week because the estimates provided by the model apparently most used by the Trump administration have been revised significantly downward. I’m not clear on exactly what was done, but a revision is not entirely surprising.
Let me give some background on modeling generally and the model that the administration seems to be using. I say “seems to be using” because the White House task force has not been clear about what model is being used.
The model whose results have changed is the IHME model (Institute for Health Metrics and Evaluation University of Washington). seems to be the primary model used by the White House response team. However, the figures President Donald Trump repeated in briefings last week of as many as 2.2 million deaths are the same as given by the Imperial College model. Dr. Deborah Birx has mentioned both the IHME model and what seems to be an internal model.
There are a plethora of models, with more showing up every day, including at the state level. Modeling epidemics is not terribly complex; fitting a curve or a few differential equations will do. It’s useful to have several models to cross-check, but that’s not happening much, given the other pressures on states.
I think of models as being of one of two types, from the bottom up and from the top down. A model built from the top down chooses a curve to fit to a data set and then uses that curve to look at other data. One built from the bottom up takes component parts that go into the progress of the epidemic – how effectively the virus is transferred from one person to another, the effect of social distancing – gives them each a mathematical representation, and combines those representations into a model. Gradations between the two are possible.
The influence of various factors is easier to see in a bottom-up model. In a top-down model, the factors may be mixed with each other and harder to separate. The types of assumptions are different for the two types of model. A bottom-up model can separate the parts of the transmission process: interactions among people, susceptibility to the virus, the infectiousness of carriers, along with damping down of transmission by distancing and acquiring immunity. Both types of model can be useful. Matching the models with data and with each other helps to firm up parameters and assumptions.
The Imperial College model is bottom-up. It starts with the ways the SARS-CoV-2 virus might be transferred among people, works through how social distancing affects that, and builds separate equations for different parts of the process. The IHME model is top-down. It fits curves representing deaths in various locations with four parameters. It then works back from numbers of deaths to the need for hospitalization and equipment.
I’ve looked at the Imperial College model in some detail.
As I understand it, the function describing the curve in the IHME model was derived by fitting cumulative death data from Wuhan, China. Then the same function was fit to data from other places. Of four variables, two are interpolated with Wuhan data (the change now presumably incorporating data from seven regions from Italy and Spain) and two that vary with specific location.
Because of those last two, the uncertainty band will be larger early on, when less location-specific data is available, than later with more data. The IHME projections are said to be updated as new data come in.
The uncertainty bands are enormous, for many reasons. The discussion in the description of the model gives some sources of uncertainty, but there are other sources that may not be included. I hope to discuss all the things we don’t know and how they affect modeling in another post.
Although the United States plot reaches its maximum in mid-April, the maxima for individual states range from this week to mid-May. The Imperial College model projects a maximum for June and July.
Perhaps the most questionable assumption of the IHME model is the strict lockdown observed in Wuhan and Italy. The United States has had less population distancing than in those places. I’m not at all clear how this quantitatively fits into the four parameters of the primary equation of the IHME model. It looks to me like it is implicit in using the Wuhan and other data, but some of the discussion in the paper sounds like it is more explicitly included without saying how.
When I’ve listened to Birx speak about the models in the press conferences, I’ve had a concern that she doesn’t really understand them. She is said to have worked with the modeling of the AIDS epidemic, but that was some time back, and computer capabilities have greatly increased what can be done with models.
In the IHME model, I find misleading that the hospital resource projections are given in three and four significant figures when the error bands are so large. Those numbers imply a precision that cannot be part of the model. That, and the large adjustment just made, will discredit models generally with the public, along with the wide range of predictions by the different models.
Here’s an article on epidemiological simulations generally that you may find informative.
Cross-posted to Nuclear Diner
Mike in DC
The number of new deaths today(+1877 according to Worldometer, with one hour to go) tends to cut against those optimistic downward revisions. If we double today’s daily number one week from now, that’s above the upper end of the range in the IHME model.
Wag
Thank you for the concise and absolutely clear description of the difference between the two modes of modeling epidemics. I had no idea until now the difference, and will be more intelligent about how I look at models going forward.
dmsilev
I get nervous putting too much trust into top-down models just as a general rule. If there’s some variable you don’t know about or can’t measure or whatever, you could end up in an entirely part of parameter space with an entirely different functional form, so your model prediction is just going to spit back (highly precise) nonsense. Sometimes it’s the only viable option, when doing a bottom-up model is just plain not feasible, but it’s not ideal IMHO.
Jay
japa21
@Mike in DC: I have noticed the last couple weeks that the Sunday and Monday figures seem a little low and then Tuesday they jump. Wednesday they settle again. Today they will probably hit 2,000. Tomorrow’s may be a better signal.
Cheryl Rofer
@dmsilev: I much prefer bottom-up models, but top-down models can work too.
Carl Bergstrom, who has been following the IHME model more closely than I have, was thinking about their limitations too.
Jay
piratedan
@Jay: you’re probably start hearing more stories about hospital layoffs and furloughs… because treating Covid patients doesn’t pay the bills like elective surgeries do…
also, just because Trump has stated that the US will reimburse for testing, no one has stated exactly where to send the bills to and under what schedule the facilities doing the testing will be getting reimbursed….
while this first wave of the pandemic is sucking big time, the financial undertow is going to be pretty ugly as well and based on past history with THIS administration, if there’s a way for them to park the money allotted to relief for the providers and the citizens so they can bank the interest, that’s exactly what they’ll do.
Don’t go to sleep on what the insurance carriers will be doing as well, while your testing may be covered by the FEDS, I would fully expect the insurance companies to do what they do with great ease… deny, deny, deny… with THIS administration, who’s gonna take them to task?
Martin
Yeah, i agree with this.
The ‘bottom up’ models are at one end, simulations, but what I typically refer to as ‘functional models’ because they aren’t pure mathematical constructs. They take information from other disciplines to drive or constrain the model. The top down are pure mathematical models.
With my mathematical lobe I can fit a curve to a set of data elements and predict with great confidence that 110% of the population will die in 4 days because the math works and I’m not a biologist or epidemiologist. With my science lobe, I look at the 10 day incubation period of a disease and conclude that there’s a maximum rate of community spread due to that and my model can’t violate that. I certainly know it’ll take more than 10 days to kill everyone, and I certainly know I can’t create 10% more people in 4 days. But I may have no idea how to make the existing data fit into that framework.
In my professional work, for anything important I build both types of models and draw my conclusions when they are in agreement. When and where they agree usually provide me with a sense that I’ve at least gotten the first order factors correct. And where they diverge tells me where to look for more understanding. It’s one thing to observe a trend, but you need to understand why the trend is doing what it is to have real faith in it.
So I’m observing some softening of fatalities, but I don’t understand why – mostly I just don’t have access to the information I need. Are there a lot of fatalities not being counted? More evidence is pointing that way. Did the public do enough voluntary distancing to materially bend the curve? I think there’s certainly some of that. Google’s phone tracking by location visits could shed light on why one county might be taking a lockdown more seriously than another. The availability of tests for postmortem diagnosis could also shed light. i can then take that data, build a functional model, and then test it by grabbing a data set that I haven’t mathematically modeled and see if they line up.
So, i think IHME might be right that the fatalities top out sooner than i expect, but I haven’t found a single other dataset that shows the kind of fall-off of fatalities that they’re predicting, nor can I find a mechanism that would cause it. There seems to be an assumption that is when herd immunity kicks in in volume, as almost all historical pandemic data sets point to, rather than some kind of social intervention which we’re trying here. So why assume this artificial easing of fatality rates will model out the same as a natural one? IHME doesn’t offer any explanation because it’s not a functional model – it doesn’t care how the virus works. I think that part is also important.
dmsilev
@Cheryl Rofer: The twitchiness of the ‘peak date’ mentioned in that thread is a good example of the problem. What that should say at a bare minimum is that the predicted peak dates need to be reported with big uncertainty ranges.
piratedan
@dmsilev: anecdotally, for the hospitals that I support in Tacoma… based on the internal models that they have used based on the local numbers, “our” peak could have either already occurred (4/2/2020) or it could be another two-three weeks out yet…
Our inpatient numbers have been slowly climbing but the emphasis is on slowly, the issue is that the hospitalization time is significant, with patients being in close to a 21 day cycle in the hospital… which is … lengthy.
Mathguy
“When I’ve listened to Birx speak about the models in the press conferences, I’ve had a concern that she doesn’t really understand them.”
This is what concerns me more than anything else. It’s the misinterpretation by the buffoons in the WH. When the most “competent” person doesn’t appear to have a solid understanding, it’s recipe for further disaster.
Cheryl Rofer
@Martin:
Yes! I would like to see a lot more of comparisons between model results. That’s a big part of what climate modelers do, for example.
I’ve got another model I should try to write up quickly, where they actually compared to the Imperial College model and came out pretty close.
Cheryl Rofer
@dmsilev: I was kind of shocked at that. All the modelers should be doing sensitivity analyses to pick up that kind of thing.
Jay
@piratedan:
Roger Moore
@dmsilev:
One question I have is how sharp the peak is. People talk about a peak date as if there’s going to be a rapid rise and then an equally rapid fall. My impression from looking at the data from places like Wuhan and Italy is that the peak is actually likely to be broad and flat. It may actually be broader here in the USA than it is there because we haven’t been as thorough in locking down, so we probably haven’t gotten R as low as they have. In any case, that kind of broad, flat peak may give you a wide range of peak days, even if there’s agreement on the big picture of what’s going on.
Calouste
@Martin:
The Netherlands reported last week that the excess deaths for that week were about double the reported deaths attributed to COVID-19.
In the end, we will only know in a few years time what the total impact of COVID-19 has been, accounting for both the people who were directly affected as well as the people who either recovered but still had their life expectancy lowered, and the ones who had their life expectancy lowered because they didn’t get timely medical care for other conditions.
Martin
@dmsilev: If you don’t understand the underlying mechanics (which we largely don’t in this case) then a functional model is impossible to build. We don’t know how much closing K-12 affects the reproduction rate. Or closing bars but not gun stores, and does an open gun store have a bigger effect in CA than in AL (it almost certainly does). We just don’t know because we don’t know where and how to measure these variables or even what the most important variables are.
For functional models, ideally you have this abundance of data and your first job is to filter it into which variables historically have affect the the measured result and to what degree. That starts to give you clues as to what drives the result. Did my product sell more because I lowered the price by $1 or because some influencer endorsed it the same day? If you have a dataset of the influencers followers and their purchase history – you can see if there’s a cause/effect there that is stronger than non-followers who are probably responding to price, or you can compare different markets with different discounting, etc.
The main benefit of the pure mathematical model, IMO, is to help validate your functional model, or provide a set of possibilities to plan against. A lot of the early action was due to people like me saying ‘if you do nothing, you’ll lose this many staff and students’. That got them into the decision space that they would do something, but it was a question of what and when. That was an important step. We then worked on the when (less than a week) and then the what (you can’t have lecture halls, dining commons and dorms with shared bathrooms open) that narrowed the possibilities a LOT. I didn’t need to know a lot of the mechanisms, I could take that from standard epidemic models and the little bit of data we had out of China. It was very top down. It’s all I had.
But the quantitative predictive value of those models tends to be pretty garbage. The qualitative value for decision making can be really important. My leadership didn’t care if it was 5% of the student population or 40% dying – the lower bound was unthinkable.
Jay
Martin
@Mathguy: She doesn’t understand them, nor does the Surgeon General. Only Fauci understands them, and he understands their limitations as well.
nwerner
The IHME model peaks on April 16th at a growth rate (for deaths) that hasn’t yet been demonstrated in a US state. Washington is probably the best proxy and it is still at 8.5 to 10% increase in deaths per day (one outlier day at ~6%) but on a much smaller population and one that was made aware of the risks far earlier than the rest of the country.
Jim, Foolish Literalist
@Roger Moore:
Ron Klain made that point rather emphatically on TV the other night, sounding like someone who’s been troubled by an excess of optimism about that dramatic fall
Mike in NC
Fat Bastard loves models, preferably young and scantily clad.
Jay
@Martin:
But the quantitative predictive value of those models tends to be pretty garbage.
The qualitative value for decision making can be really important. My leadership didn’t care if it was 5% of the student population or 40% dying – the lower bound was unthinkable.
Martin
@Roger Moore: Exactly. Everyplace the growth seems to just stall out and get stuck there for weeks.
That doesn’t happen in a traditional epidemic model because those crown over and fall due to running out of people to infect. Now, if you look at a broad enough geography you can see multiple outbreaks hitting and falling off at different times creating that kind of a plateau.
But in our cases, we’re flattening the curve at 1-5% of population infection. There’s very little herd immunity there. It’s just not going to match a traditional epidemic model.
Cheryl Rofer
@Roger Moore: If you look through the states in the IHME results, you will see both kinds of peaks.
Brachiator
Aren’t models what you have before you start doing stuff?
And when you start doing stuff, you start documenting what you did for historical purposes and so that you can build better models the next time.
But models are not real people being treated, responding to treatment, dying.
Models are not real people respecting or violating lockdowns and social distance recommendations.
It is weird. Some of the people who want to attack the models for being inaccurate or alarmist can never offer anything substantial that is more accurate.
Cheryl Rofer
@Martin:
The Imperial College model uses agent modeling for this part of it. It’s not perfect, but I’ve seen it work better than I ever would have expected. It doesn’t go down to the level of closing bars but not gun shops, though.
TS (the original)
@Jay:
Private insurance in Australia goes up 5-6% EVERY year. Who knows why. Anyhow they decided a few weeks ago it might not look good if it was increased this year so it has stayed the same for the first time in 20 years. They will make a fortune this year without an increase due to the shutdown of elective surgery, among many other things.
Ceci n est pas mon nym
I’ve been checking the IHME model frequently and I noticed that sudden drop in the predictions. It had been saying that the last update was made on Apr 1, and an update was expected on Apr 4. It never happened on the 4th, I think it didn’t happen until late on the 5th. And when it did suddenly all the projections looked remarkably better, including in red states that have refused to lock down. I was expecting those to get much worse. I was surprised. IHME projections are here if anyone wants to look at them.
I guess it’s because of what you say about top-down vs bottom-up. They aren’t simulating behavior.
Meanwhile I’ve been doing my own little statistical study using data from worldometers which I grab once a day. I started it when I heard what the Mississippi governor was doing with discouraging social distancing, and especially how so many red states are encouraging people to go to church on sundays. I am tracking 16 states which had no lockdown order as of March 30, and comparing them to the US as a whole. I figured we might start seeing some evidence of a red-state wave, beginning about Friday (5 days after attending church).
Well, I saw a significant uptick today with data from the last 24 hours. It’s only one data point though.
Jay
@TS (the original):
Medicare isn’t going up. Private Health Insurance rates rising is a Privledged/ First World Problem.
Mike in DC
@Ceci n est pas mon nym:
That IHME model assumes that deaths more or less stop at the end of this month, if you follow the line along. I am skeptical. And of course it doesn’t model any subsequent waves.
Fair Economist
I haven’t seen any indication the IHME is incorporating data from Spain or Italy into their model. Since their epidemics have not run their course, I don’t really see how they could.
The revisions are just because there are more points to fit for their curve. I’d love to see a revision incorporating the 100 deaths in GA today because that would probably make their prediction go crazy.
The other problem with the IHME is the extremely variable data quality between states and even within states over time. Fitted curves tend to be heavily influenced by data on the ends and so a change in reporting could have a very outsized effect on the projection.
Fair Economist
@Ceci n est pas mon nym: An uptick today is probably the end of the weekend effect. The data collectors work less on the weekend and catch up Monday, with a big jump coming Monday or Tuesday.
Cheryl Rofer
@Fair Economist:
I thought this at first too, because fitting these curves with a few points at the beginning is likely to be particularly bad. But the Washington Post article that I linked said that they incorporated data from Italy and Spain into the model, along with the Wuhan data.
I’m not entirely clear on how they do that, but part of the curve-fitting seems to be an interpolation between the state/country data and their “standard”, composed of the Wuhan/ Italy/ Spain data. There’s a lot of room there for a lot of funny business, methinks.
ETA: It also looks to me like they are fitting each state individually: of their four parameters, two are done by that interpolation, and two are variable with the location.
Another Scott
Thanks for this.
I look at their model every day, but like you (and like the early critique on Twitter), I find the “error bars” they show to be very misleading. They are upfront that they’re assuming the Wuhan total lockdown results will translate elsewhere via 4 parameters – 4 parameters that are far, far short of total lockdown that Wuhan went through. It makes the problem more tractable, but, man, it’s up there with “assume a spherical cow”…
They are modifying their predictions as new data comes in – the peak daily death numbers keep increasing.
It would be instructive if they kept their original projections on their graphs, so that people could see real-life deviations from their simple model (and thus not take their peak and total projections as gospel)…
I do applaud them for putting their model out there.
My $0.02.
Cheers,
Scott.
Martin
Data time:
So, on the fatality data, we’re almost certainly seeing a few procedural affects here. While I still contend that fatalities are more accurate than cases, it’s not without problems. One which we know occurs is undercounts. France just added in hundreds of nursing home fatalities that weren’t in their old counts presumably because they follow a different process. When fatalities are low we can better trust they’ll be tested posthumously, or that they were tested prior to death. We have other variables within that trend, states with coroners vs medical examiners, and that sort of thing. And you have the hospital fatalities vs the at home or nursing home fatalities that might be processed somewhat differently (as they were in France). And you have the problem of how backlogs get handled. There’s likely a maximum number of bodies that NYC can process in a day with available staff. What happens when we exceed that number? Do we wait a day to process it? Do we not classify that fatality and just bury them? And what happens when those medical examiners get sick and the maximum throughput gets knocked down? And are there an equal number of MEs working every day of the week, or do they still bias toward M-F, leading to more cases being classified when employees surge on Monday and then reported on Tuesday? In NYC it’s probably all hands on deck week round, but in smaller counties where there may only be 1-2 MEs or where cases are small enough that it’s not a round the clock effort, you can probably see a more traditional workweek pattern.
We don’t really know these things in part because we’ve never been through this situation before to have prior experience. We had large numbers of fatalities on 9/11, but it wasn’t sustained day in and day out. But this is why we can’t look at 1 or 2 days as a trend. We need to see longer trends as well as some anecdotal information that suggests we’re getting a true or truer number, rather than a less true one.
sdhays
Do you think Birx knows she doesn’t really understand the models, or does she think she understands them when she really doesn’t? I’m guessing, based on what you’ve said, it’s the latter.
I just don’t trust anyone, regardless of their credentials, who speaks at the White House podium, with the exception of Fauci, and he’s clearly working with significant constraints, so even he isn’t 100% credible since he can’t be 100% honest.
beef
@Roger Moore:
I think that’s probably what we should expect. The SEIR models which I’ve looked at generically predict that the number of cases/deaths/etc decays relatively slowly after the peak, compared to how rapidly it rises before the peak.
smintheus
I don’t know how much longer stay-in-place is going to work in PA. Until a few days ago I’d have said it was fairly successful. Today I decided to read a book while walking down my normally lonely country road, however, and people were everywhere for hours on end. Fisherman clustering at favorite spots. Dozens of people riding up and down, windows open, not a mask in sight. Motorcyclists zooming, bicyclists spewing their germs, at least a dozen walkers/joggers, a frickin work crew building a lame-ass wooden deck on a neighbor’s house – working within inches of each other, no masks, no goddamn sense. Another neighbor walking shoulder to shoulder with a visiting friend (just a few days after she had a COVID19 test come back negative). Typically I can walk for miles in the middle of the day and not pass a soul. I finally gave up trying to walk, and sat on my front steps in the sun … and then had to get up about once every two minutes to retreat from another whacko who was out and about staying in place.
EthylEster
CR wrote: projections are given in three and four significant figures when the error bands are so large
I see this everywhere all the time. You’d think significant figures was some kind of thorny math idea that only geniuses could master. I was first introduced to the sig fig concept in freshman chemistry. My prof was old “old school”. If your answer was calculated correctly with all work shown with correct unit but had the wrong number of sig figs, only 80% credit awarded. (If no unit or incorrect unit, 0% awarded.) I caught on fast. However, when I taught freshman chem for 10 years, I could not get my students to accept the same approach I was brought up on. Many would have failed if I had imposed such a rule. And even more sadly they did not seem to be able to master the really simple rules for determining the correct # of sig figs.
Most people do NOT think about errors in measured quantities and how those errors propagate. So the idea that you could write a number in such a way as to imply its estimated uncertainty is wasted on them. Sad.
EthylEster
@Jay: I think everybody is undercounting cases AND deaths. That article from the NYT referenced EMT stats, how # people dying at home is higher year over year. And this gets to whose death gets counted. Evidently in NYC if you don’t have a positive test result and you die at home, you are not counted. This is one thing I don’t understand about the models. They need good data and I don’t see how we have any really good data given the uncertainties we already know about. But maybe this is CR’s point about model quality being strongly negatively affected by uncertain input data.
Martin asked: Are there a lot of fatalities not being counted?
Yes, it appears so and is not that surprising really. NYT article said about 200 uncounted per week extra IIRC.
EthylEster
https://www.seattletimes.com/seattle-news/health/coronavirus-daily-news-update-april-7-what-to-know-today-about-covid-19-in-the-seattle-area-washington-state-and-the-nation/
Long discussion of IHME model as it applies to Seattle, WA, USA, world.
bbleh
@dmsilev: @Cheryl Rofer: @Martin:
Lots of thoughtful points in this thread about “top-down” and “bottom-up” models, which I would term statistical and mechanistic respectively. A good model incorporates both: it makes sense mechanistically, and it conforms to the available data. One that is only statistical implicitly assumes that whatever was happening before will continue to happen, which might be approximately true but should never be assumed without careful consideration. And one that is only mechanistic might seem to make sense but must always be tested against available data, since there may well be things happening that you haven’t thought of.
A good statistical model incorporates a mechanistic structure. If the underlying mechanistic structure is a reasonable approximation, then its parameters can be estimated statistically from the available data, and the resulting model can be used for prediction looking forward, with adjustments appropriate to the situation to which it is being applied. But of course, those adjustments are a matter of judgment. Ultimately, prediction requires judgment. To depend on the statistics or the mechanics without informed (and skeptical) judgment is a dangerous error.
EthylEster
@bbleh: A good model incorporates both
As described by CR, they are two different approaches. But I agree that having both for comparison (as Martin favors) is better than only one approach. IMO they cannot be combined.
In chemometrics, we say “soft” model for statistical ones and “hard” model for mechanistic ones. The hard models are based on a mathematical understanding of how things work, based on a physical or chemical (or other) theory. The soft models just look at data variance or change over time. No theory guides. At least that’s my understanding of the matter.
egorelick
TL/DR – We probably agree on more than it seems but what we obviously don’t know makes it seem like we are arguing. Martin, I have disagreed with you about your numbers and this is a good post to lay out why I believe in fewer (reported) deaths more in line with the IHME model. We know nothing. We can only fit curves at this point. Things we don’t know:
1. How many deaths there were in China.
2. How many cases there were in China.
3. How many uncounted cases there are anywhere.
4. How many uncounted deaths there are anywhere (and that’s why the IHME model is only good for counting reported deaths).
5. Whether mask protocols work.
6. When we will have on-demand testing.
7. When we will have serological testing.
8. How big a factor viral load is in transmission. Huge based on healthcare worker cases, but that just might be reporting bias.
9. How much surface transmission is a factor.
10. Whether the virus aerosolizes.
I live in California and work in a hospital here and early and aggressive action has certainly flattened the curve where I am. The hospital ER where I work at had the least busy shifts last week since it opened. It would have been a pleasure to be paid for how easy it was except for the specter of death hanging about.
ANECDOTE WARNING!!!! FWIW, I believe one of the uncounted factors, at least in CA, is some herd immunity from a level of endemic infection that was just not seen. My sons got sick in late February/early March after a short trip mid-February. We passed through 5 airports in that trip and I am guessing it was COVID-19, but they are teenagers, and I was mainly worried that they were malingering about not going to school when they told me they were sick (the idea they had COVID-19 seemed impossible at the time). Co-workers have similar stories. Another anecdote that supports my belief is that we had a worse than average flu season right up to COVID-19 and then flu treatment stopped. (You would be surprised how much flu treatment is empirically based and not tested – or maybe not.) So I assume some of that excess flu was COVID-19. We know from Helen Chu that community transmission was happening in Washington State long before it was seen officially; we can assume that was true a lot of places.
glc
I don’t think curve-fitting is a “model” and it’s confusing to treat it as one.
You can make predictions that way, but if you guess wrong then you’ll still make predictions and their failure won’t tell you anything. With an actual model, if the predictions don’t pan out you will probably learn something valuable.
https://youtu.be/Ewuo_2pzNNw (good Q/A session about 53 minutes in).
egorelick
@glc: I’m not sure you understand curve fitting.
glc
@egorelick:
Let me give you an example: linear regression.
Does that help?