The first thing a data analyst trainee should learn is that playing with Excel’s functions and tools is a great way to get into trouble when you don’t have an underlying understanding of the fundamental data’s behaviors AND don’t understand the functions and tools core assumptions. This is important. The second or third lesson a data analyst trainee will learn is to not use Excel but that is advanced training.
Why does this matter?
It seems like the White House is using Excel and not understanding the phenonoman they are trying to model.
To better visualize observed data, we also continually update a curve-fitting exercise to summarize COVID-19’s observed trajectory. Particularly with irregular data, curve fitting can improve data visualization. As shown, IHME’s mortality curves have matched the data fairly well. pic.twitter.com/NtJcOdA98R
— CEA (@WhiteHouseCEA) May 5, 2020
Eyeballing the data, there sure as hell seems to be a day of the week seasonality. But let’s go beyond that.
If we were to assume that a cubit fit is an appropriate choice to model the data, and that we can project out of the current data to the near future so that there are almost no deaths on May 15th, that requires a What the Hell response.
We know that COVID-19 does not kill quickly.
A person infected today is unlikely to show up as a death until Memorial Day.
We looked at this basic math in March:
If we can safely add up median time from infection to symptoms and then symptoms to hospitalization, that sums to a back of the envelope span of 12 to 13 days.
We’ll have to add time from hospitalization to death. But this morbid math has a point. The next 7 to 10 days of deaths have almost entirely been baked into the cake as these are individuals who were infected before states started trying to open up again. We’re not going to get a reliable signal on mortality due to policy changes for at least another two weeks in the states that have been early and aggressive in re-opening.
For a cubit curve fitting exercise to be valid, we need to bring this basic mechanical reality into play. And that reality is that the people who are highly likely to die on or before May 15th are already infected. For there to be no deaths on May 15th, we basically need no one to have been infected after April 27th or so.
This is basic data analysis guidelines — understand the fundamental phenomenon you are trying to model while also understanding the modeling assumptions of the tools being used. The White House decision making systems are disregarding both tenets of basic data analysis.
Mike J
Is a cubit fit always describing gopher wood?
Roger Moore
As someone else pointed out, some of the IHME projections they’re comparing themselves to are at least a month old. It’s a lot easier to look good when you’re comparing yourself to projections from that long ago. Even worse, the IHME projection from late March is both a better fit to the data and less optimistic about when the death rate will zero out than the CEA projections. Not to mention that the IHME projections have already been severely criticized for assuming a rapid fall-off in deaths that doesn’t match experience. Seriously these guys deserve to be laughed out of the room.
Ryan
There may be a more obvious clue available for the layman.
https://en.wikipedia.org/wiki/Dow_36,000
Shalimar
What really stunned me about that chart when CNN showed it during Hassett’s interview was that it was updated today. I had been giving him a small benefit of the doubt and assuming his incredibly stupid May 15th projection was made a week or two ago. I was naive.
moops
I’ve been surprised really that none of the machine learning/deep learning tools haven’t been slapped at this forecasting thing yet. As for the “cubic” model: your model is supposed to be derived from first principles of some kind, and result in free parameters, which are then derived by fitting to data, then assessed for quality. So, what infection process is captures the the coefficients of ax^3 + bx^2 +cx + d? what does ‘a’ represent? well, it seems if you ever want to get back to zero then a had better be negative. If we go back in time we had no COVID deaths, so I guess d better be zero. and so on. I don’t think there is anything meaningful here. the initial rise in cases can look like a polynomial, that’s about as smart as this looks. No polynomial model is going to have outbreaks and resurgent cases and long plateaus and all the things we know happen in outbreaks.
rjm
Obligatory Kevin Hassett curve fit disaster slam, in case BJ hasn’t done this to death yet
https://www.discovermagazine.com/the-sciences/best-curve-fitting-ever
https://images.ctfassets.net/cnu0m8re1exe/7gVG4hcR69xKOsWI7IZsZa/233355c368e66c5ec2c788f27d4ea142/thoma1.png?w=650
Yes! really published in teh WSJ. What a maroon
jl
Reliable statistics and models that use them, for control versus just observation and forecasting in the absence of attempts at active control, require a very sound causal model behind the statistics. Every statistical model, no matter how sound, has some blind by-guess-by-gosh curve fitting buried someplace down in the inner guts of the model. But, you want to keep that to a minimum.
Problem is that when you use statistical models to try to control a process, there are always many causal models consistent with the statistics. Some causal models that you can’t rule out with the stats will mean your control efforts will work, others mean that your attempts at controlling the process will flop. A sane and good faith effort at forecasting effect of control policies, past present and future, aims in a good faith way, to minimize the chances of that latter possibility.
Looks like the ‘cubic’ model violates all those considerations, and also common sense.
Edit: thing to remember that a fast and furious reopen, is a control policy just as much as a shutdown. It is the assertion that if some signal shows that reopening won’t prompt another big epidemic wave, then you can actually reopen, and won’t be another big wave. For Trumpster/GOP a good signal seems to be that the epidemic is not raging like a wildfire destroying everything in its path faster than you can keep track. Which, by common sense standards, seems a tad odd.
VOR
It does explain why Trump thinks the virus is just going away. He is being shown charts where cases just stop May 15. Magic!
Roger Moore
@VOR:
I think you have your cause and effect backward. This chart was made to support Trump’s belief; it wasn’t the cause of it.
sukabi
@VOR: more than likely he’s demanding visuals that support his belief in the magically disappearing virus theory, which is what this graph does.
sukabi
@Roger Moore: yep. I owe you a fizzy beverage.
VOR
@sukabi: Almost certainly correct. They are responding to the whims of the Great Leader and giving him what he wants. And then Trump says “See! I was right, my top advisors agree”.
Hunter Gathers
Watch out, you’re playing with fire.
There are countless dipshits in the workforce whose only hope of continued employment rests on the perception that they are total, complete badasses in Excel.
jl
@Hunter Gathers: Fire is a good analogy to how epidemics work. Often a long period of smoldering that you might not even notice. Then when enough very high heat from fire you don’t see, interfaces with enough fuel, it explodes. The the fire burns until not enough fuel left to keep the fire going.
So, in the absence of an effective control policy (fire extinguisher) saying there are just 15 cases, or 10, or 5, and that is a small number and that will go away, is like saying there is just a small flame at the bottom of your living room drapes, and no reason to worry, it will go away. Unless you are very bizarrely lucky, the fire won’t just go away.
Boris Rasputin (the evil twin)
I’d figured Billy Jim Blob is out there now, getting his “I survved the Crovid-19 Hoax” tattoo today. If I understand correctly, he’d better show it off fast, as he’ll be dead on Memorial Day.
The question that remains is: Does he live in Texas, or Florida?
Cam-WA
@Mike J: “What’s a cubit?
https://m.youtube.com/watch?v=CgsFCyD4nEw&feature=emb_logo&ebc=ANyPxKrTLi4pVt3yP0T6bOyOYlgWKgZ5od8WENntsv0qoW4DQw91A38XCEMwVlMlcin0K379eVVacvAOLTxaMbxwE-8G2wbysg
(Back from when he was actually funny and not a serial abuser)
lashonharangue
Hassett, Kudlow, and Laffer are prime examples of Krugman’s hack gap.
rikyrah
Just look at the headline…
412 asymptomatic workers at western Missouri food plant test positive for coronavirus
BY LUKE NOZICKA
MAY 05, 2020 02:38 PM, UPDATED 1 HOUR 12 MINUTES AGO
More than 400 workers at a St. Joseph food plant have tested positive for the new coronavirus despite showing no symptoms, health officials said Tuesday.
Comprehensive testing of employees and contract workers at Triumph Foods in St. Joseph found 412 of 2,367 people had COVID-19 with no symptoms, according to the Missouri Department of Health and Senior Services.
https://www.kansascity.com/news/coronavirus/article242515976.html?utm_source=pushly&intcid=%7B__explicit:pushly_525458%7D
glc
The IHME model is pretty silly as well (at least in the form where (a) it gives a number for total deaths; (b) has a public-facing interface; (c) is constantly updated by recent data to look plausible; and (d) doesn’t distinguish between the Chinese approach to shelter in place and anyone else’s). This seems to be widely recognized now.
Even a straight line can be good for short term projections, the cubic tends to add at least visual plausibility and possibly some things present in the data. But curve fitting is not, by itself, modeling, and it helps to use at least some information about what one is trying to model, somewhere in the process. And epidemiological models actually do that.
The general theory is that if the curve is sufficiently well-behaved – a strong assumption! – it can be approximated by a power series, in which case taking the first 4 terms may be better than taking the first 2. However, when making judgments that can get people killed, one should probably read the warning labels – whether bleach or software.
Jinchi
I’m amazed they bothered to post the figure with that tweet. The (3/27) IHME model performs better than the (4/5) update and about as well as the (5/4) version. The (5/5) cubic fit model is , well, the best ‘cubic fit’ to the data up to today (5/5). It looks like it will fail tomorrow, though.
If your standard of success is the best fitting curve to the data that you included in your model, why not simply use the data itself.
Cheryl from Maryland
As my husband said about my mathematically challenged supervisors — Just because you can put a number on it doesn’t make it math.
bbleh
@Roger Moore: Concur; your causal model is sound.
dmsilev
Truth.
The Post had a big article earlier today about how Jared’s task force is failing miserably (surprise!) because basically he brought in a bunch of McKinsey etc. management consultant types rather than people with actual expertise. Excel jockeying, combined with what I’m sure are very impressive PowerPoint skills.
David Anderson
@dmsilev: I would also imagine their SharePoint was on Fleek and the tableaus look pretty.
lollipopguild
Trump is going to write an executive order telling the virus to go away and not come back.
jl
@glc: The initial IHME model assumed that from start to finish, the epidemic curve would act like a symmetric bell shaped normal distribution, and that a control policy that had the same effectiveness over time, would be put in place and maintained throughout the epidemic. All those assumptions were sketchy for a large epidemic with a new disease and a lot of unknowns, as is the case for covid-19.
If the IHME group can’t abandon some of those basic assumptions, the model might look OK in terms of constant fiddling to make its curve fit the data, but I don’t see how it will be good at forecasting.
As I noted in previous thread, we are in the stage of the epidemic where it is very difficult to produce forecasts that look good. Statistical forecasting in situations where you are constantly imposing new policies to try to control the process, works very differently than when you are just observing a process and not trying to control it.
I think need to check not only with the fancy pants math and stats and epi modellers, but also with people with a lot of very practical field experience in using those models to control a real epidemics. They rely on models, and respect the models, but they use them and evaluate them in a very different way than most of us, and that includes me, who have never faced the challenge of the very hard risky and exhausting work of trying to control a dangerous process in real time. Been that way for 100 years now. Malaria control seemed hopeless until a person who had a lot of real world on the ground experience in trying to control it, could also come up with a mathematical and stats model to explain to himself what he was trying to do. that was Ross.
You always need to check with the epi, medical, and public health workers doing the WWI trench warfare control work in the real world, I think.
Jay
https://www.vanityfair.com/news/2020/05/whistleblower-complaint-rick-bright-blasts-team-trumps-pandemic-response
Mallard Filmore
rjm @6 … does the tax-revenue chart mean we should be like Norway?
Mom Says I*m Handsome
I’m not an epidemiologist or biologist or statistician or modeler, but even I know that a bell curve is a fucking terrible choice to model a pandemic. (Most of my technical expertise is from the physical sciences, so I’d lead with a nice exponential decay model & I’d be more right than these glory hogs.)
Poe Larity
Mission Accomplished
BruceFromOhio
Recall this is the Tax Cuts Will Pay For Themselves gang.
Also – if you can’t or won’t post your data sets and methodology, your results are meaningless. My Magic 8-Ball tells me this is so.
jl
@Mom Says I*m Handsome: The infectious disease epidemic diffyQ equations are very similar to pharmacokinetics, many chemical reactions. You have one or two compartments with a nonlinear reaction process where two stocks interact, and then exponential decay processes that govern flows into and out of other compartments.
Fair Economist
@Mom Says I*m Handsome:
Surprisingly, a bell curve is a pretty good model for a completely uncontrolled epidemic. As the disease spreads, more and more are resistant and the spread slows, and then reverses. It’s pretty symmetrical for a moderately contagious disease.
It can be an acceptable process for a controlled disease where additional controls are added over time and never removed. And there, precisely is why it’s a total disaster at modeling COVID-19. Once the epidemic is contained, we *don’t* keep adding restrictions, we start taking them away, or ignoring them. And so the overall epidemic is very UNsymmetrical, because the declining tail is much longer and more stretched out than the initial spread. It’s even possible to have additional peaks later (already happened in several countries), which is totally beyond the capabilities of any hump-based model.
Jinchi
@moops: machine learning would fail because it’s incapable of mimicking malicious stupidity.
jl
@Fair Economist: Good for small epidemics, and long right hand tail can be more or less ignored for uncontrolled epidemics. If you want to use a symmetric distribution, other curves like hyperbolic secant or logistic better, if you need to take account of the long tails describing the long smoldering lead up and then long wind down as dynamics damps down to equilibrium (either disappearance of endemic equilibrium).
The people who came up with the IHME model have a long and very successful history of modelling chronic disease and endemic infectious diseases that are near equilibrium. I don’t know that they ever worked with an acute epidemic with explosive dynamics before. I don’t think they did due diligence in checking with theoretical and practical people who have worked in that area.
jl
@Fair Economist: And I think what you say about the long tail asymmetry after peak in the knock on process is an especially big problem for fraction of cases that move on to hospitalization and ICU. And the certainty that those resources will be maxed out is the one huge reason that extreme control measures are justified. The rapidity of approach to peak, and the height of the peak swamps resources, and takes a very long time to play out and die out. The externality if infectious people walking around spreading the disease is just so huge and has such dire consequences for others, that it needs to be just shut down with extreme measures after epidemic gets to an explosive stage.
If we can stay away from that situation, then we can pay some serous attention to arguments about when OK to let people take their own chances in getting the disease. Most regions in the US not there yet.
glc
The IHME refers to their so-called model as a statistical model – but there is no relevant underlying theory. Getting best fit to a curve with a few free parameters is not, in itself modeling. They are on the one hand open about the fact that they are not using an epidemiological model, and at the same time they are promoting their “model” aggressively for use in an epidemic. The nuances are quickly lost.
Anyway I try not to look at them or complain about them anymore (hopefully this is my last excursion on the subject), but they have taken up public space that could have been put to better uses. And I think more people are going to die in part because of the way they have conveyed their projections.
Jinchi
Right. Especially since policy people will judge you on your top line number (e.g “We predict 60,000 deaths’, versus ‘We predict 60,000 deaths if everyone shelters in place for the next 12 weeks, testing is widely available, clusters are isolated rapidly and idiot governors don’t reopen days after passing the initial peak’)
cain
@rjm:
We should maybe also bring in Art Laffer – he can show how this fits with the laffer curve.
jl
Hey you dang kids, you can make a little metabolic chemical explosive epidemic inside you. Probably not good to try at home, unless you like sudden death, brain damage or irreversible loss of sight. Same type of processes: Check out this youtube channel from a medical doc:
Chubbyemu
Throw a bottle nutmeg into a protein shake and chug it, chug moonshine, do extreme exercise until your muscles fail and then force your self to keep going…
lumpkin
David, do you really think they were trying to model the data? I think you are too kind. A more plausible explanation for Hasset’s murderous “cubic model” is that it’s more than sufficient to bamboozle trump into thinking the virus goes away by mid May so let’s open the economy. And the national press, being only slightly less gullible than the president can talk about the white house cubic model as though it has some actual merit.
rjm
@cain: Yeah Laffer could fit a curve that would really cut the death toll.
I’d seen a more complete takedown (as if it was needed) of the WSJ graph, and it turns out the Norway data point was manipulated by including carbon tax income along with corporate tax which moved it much higher than the rest of the data.
https://www.bradford-delong.com/2017/07/paul-gigot-and-kevin-hassett-monday-smackdownhoisted-from-2007-the-most-mendacious-graph-the-wall-street-journal-ever-publi.html
JaneE
When I heard that zero deaths by May 15th, the first thing I did was check the last few days number of new cases. It was over 25,000. Nope, not going to zero.
?BillinGlendaleCA
@cain:
Tax cuts will make the virus go away, it works better if they’re capital gains tax cuts.
jl
@?BillinGlendaleCA: OK, now I finally do believe that you do have serious training in economics. That’s the way we were trained to think!
Baud
I’m heartened that the model shows that all the dead people will come back to life.
prostratedragon
@rjm: That graph is indeed a “laffer.” Don’t know how I missed it at the time, especially since I was reading DeLong more frequently then.
Pro tip: if the fitted curve is the outer envelope of the data points, there’s something fishy going on. The fit is supposed to be in the nature of an average.
Baud
@?BillinGlendaleCA:
Do you think tax cuts alone will be enough this time, or should Congress also mandate a medically unnecessary vaginal ultrasound?
zzcool
Do you think it was a specific choice someone made to have ‘actual fit’ look like Trump drew it himself with a sharpie?
jl
@Baud: I personally think giving me all the money will be an effective policy to Solve All Problems. I have my ‘quartic’ model that produces fantastic forecasts
Edit: I also have scientific proof that Baud deserves a nice cut of the loot if the virtual eminence provides strict geometrical logic in support.
Roger Moore
@Hunter Gathers:
Honestly, Excel is a pretty useful tool. It has limitations, but somebody who knows a lot about data analysis can use Excel to do some fairly sophisticated stuff. The thing to be wary about is that serious data analysis people generally prefer more sophisticated tools than Excel, so use of Excel is often a warning sign that the user is not as good at data analysis as they think.
?BillinGlendaleCA
@Baud:
As long as they don’t get in way of the tax cuts, they’re ok.
?BillinGlendaleCA
@Roger Moore:
I think it’s also a sign they went to B-School.
?BillinGlendaleCA
@jl: Did you know that tax cuts will cure the clap and erectile disfuction?
Obvious Russian Troll
@moops: I’m sure somebody is going to try it or more likely is in the process of trying, but that doesn’t mean they’re going to get results that are any better than this guy.
I will confess that I am cynical about the current state of AI and machine learning. It works, but not for all problems; I think people are going to waste a lot of money throwing AI and machine learning at problems where it’s a bad fit.
The shitty testing data we have is unlikely to help, I suspect.
jl
@?BillinGlendaleCA: I’ve pointed out repeatedly that I get ads on BJ for fat, balding, broke ass deabeats with cars that don’t run, who want to solve all their problems with zero effort. I don’t know what kind of ads you get.
Croaker
@Jay: The good Doctor needs to hang w the rest.
?BillinGlendaleCA
@jl:
Have you tried at Tax Cut™?
(Tax Cut™ is a registered trademark of the Republican National Committee.)
Another Scott
@dmsilev:
So, did an Excel coding error destroy the economies of the Western world?
Excel – Is there anything it can’t do??!
Cheers,
Scott.
Roger Moore
@?BillinGlendaleCA:
Honestly, I know a lot of scientists who use Excel. I use Excel for a lot of stuff. I know how to program in several languages, but often if I want to do some quick calculations and put together a graph or two, Excel is the easiest way of doing it. It’s also nice to demonstrate some data analysis techniques, because the tables make it easy for people to see what’s happening to their data and really easy to see how things change if some of the numbers change. Horses for courses.
?BillinGlendaleCA
@Roger Moore: I’ve used Excel since it came out for Windows back in the late 80’s.
LongHairedWeirdo
Absolutely. But some time before then, there should be a glimmering of “the deaths of millions of people could depend on this; so I should be absolutely sure that this *might* work, at least, hypothetically.”
Sometime after the beginning, you should learn not to do a cubic fit *past* the endpoint of a function’s data, if that function is not a cubic. Sure, if you have data to May 5 and hope to extrapolate May 6, it *might* work. But the further you go, the more you lose..
A cubic fit is not bad for some things, but it’s best used for extrapolating data *between* known points. For example, if you didn’t have direct reporting of cases/deaths on Saturday and Sunday, a cubic fit estimate would let you estimate deaths and cases on Saturday, and on Sunday, extrapolating from the previous, and following, days.
Now, having see this, I see that we’re dealing with the worst sort of moron.
Generally speaking, a curve fitting match both the endpoint, *and* the first derivative (rate of change).
There’s a dip in the final number – that means a cubic fit will show a decreasing value, and, by the nature of cubics, will probably go toward 0 as it does on the graph.
So, this is a moron who didn’t understand the tool he was using, didn’t understand the type of data he was trying to model, and didn’t understand that a cubic fit would never be good for predicting the future, even if it was a decent model, of data that was well understood. And, let’s face it: not one person reading this is surprised that he has the ear of Trump.
LongHairedWeirdo
@Another Scott: Not a coding error, and not from Excel. The tool worked fine, but this situation had the tool being used like a hammer being used to loosen a bolt. It was the wrong tool, used in the wrong way, and anyone who knows anything about tools and nuts/bolts knows that it’s wrong.
So much winning. SAD!
moops
@Baud: well, on May 15th we will start having “negative dead”, I would state that these are undead. So….. zombie apocalypse is how we end our pandemic. By September there would be more zombies than people.
schrodingers_cat
@Roger Moore: Agreed. Excel is a pretty power tool for data analysis.
Another Scott
This is my shocked, shocked face.
Cheers,
Scott.
LongHairedWeirdo
@Another Scott: You know, this reminds me of a comment I saw about the author of Liberal Fascism, that “it will be a perpetual source of pain to many that he will never realize just how stupid he truly is.”
You know, folks who’ve said that it’s a big deal with Excel being at fault here? I’ll grant you this much; Microsoft did give the proverbial pack of matches to the child in the wooden shed.
I’ll grant, my background might affect my point of view here, but: I can’t imagine having me, personally, look at that *beautiful* graph, showing just what we want, and not finding someone, either at Microsoft, or a PhD who’s really good at modeling, and saying “How does Excel do this? Is it applicable?”
I mean, isn’t it a given that confirmation bias is the worst sort of wishful thinking, and the first place where you *have* to check *all* of your assumptions? Getting the right answer, almost by magic, should make a person deeply suspicious. And the thing is, you don’t even *need* an expert – I have “only” a master’s degree in math, I did a search on “cubic spline”, and didn’t need my old Numerical Analysis text to piece together the problems. The most critical problem: this sorts of models always depend on the slope used for the final endpoint, and you can eyeball how the “actual data” from the graph is on a downslope at that point, and the reason why the model goes to 0 is that is what downward sloping cubic functions usually do.
(The other method I know of for setting the slope is to set it to 0 – which is a really good way of illustrating how you shouldn’t extrapolate beyond your data!)
Another Scott
@LongHairedWeirdo: I don’t think anyone here is seriously blaming Excel – even for the “spreadsheet that destroyed the world” thing that Krugman was talking about. We know it’s not the tool used, here.
(Even experts can go crazy with, say, peak-fitting software (“Look – I extracted 7 peaks from this bumpy, noisy curve!!”) – one has to step back and think about the limitations of the data and the models.)
In my case, I’ve had a personal, er, distaste for Microsoft going back to the DOS days. It’s “fun” to pick on them, also too.
HTH! ;-)
Cheers,
Scott.
SW
The “Z” axis is pointed at you!
Dupe1970
@rjm: Oh the curves I can build when I take two outliers that rely heavily on petroleum to fund their gov’ts…..