I’m going to explain what frightened me when I saw it last night. Maybe you can see it from the front page of the Johns Hopkins data. Ignore the green line in the middle. The cumulative incidence of cases in China is in orange, the rest of the world is yellow.
I want to illustrate the steps I used to reach this conclusion because I think it’s important and would be more than happy to learn of any flaws in my methodology or reasoning.
My first step was to plat the number of cases in Mainland China (orange) versus the rest of the world (blue). Second, I found data points with close to the same number of cases for each (labeled).
The next step was to count the number of days of data for China up to that point. The reason is to show both of these curves on the same time scale to make them roughly equivalent measures. In addition, I eliminated a number of the early cases because I want to analyze the patterns in the climb of cases. I looked for a value near 5000 cases as a starting point and they both had values at the same point that were close.
Warning: Statistics Ahead – but it’s important. I’ll try to explain this as clearly as I can.
There is a variable known as r-squared (R2 ). This variable measures how well a series of data points fit a particular model. The value ranges from 0-1. A perfect fit is 1, a totally random set of data is given a zero. Most data sets never get close to either extreme. The example below shows two data sets with the best fit line (this is also technically referred to as a linear regression). The image on the left represents a good fit, so the R2 value would be considerably closer to a value of one than the one on the left.
In this case, I want to use that statistic to find if a linear regression (line) or an exponential regression (curve) fits the data points better. I’ve colored the linear models and associated R2 values blue and the exponential models and associated R2 values red.
For the China models, clearly the linear model is a little better, but I don’t recall any measure of a statistic to test if the difference between R2 values is statistically significant. That is simply a measure of how well does chance account for the differences between the two models. In this case, I’m simply going on instinct that they are pretty close, and maybe there’s a little bit of each.
The model that excludes the data from China is much more interesting to me. There is an incredibly tight fit to an exponential model compared to the linear one.
I’ll make my argument from the extreme case. Assume that the model for China were a perfect fit for a line. That would mean that the reproduction rate (R0) would be at the lower end of the range between 1 and 2. For the rest of the world, that would mean that the reproduction rate (R0) would be closer to 2.
Here’s why that is important. If the reproduction rate is a perfect one, then the cumulative number of cases would follow this pattern: 1, 2, 3, 4, 5, 6, 7…and so on. If the reproduction rate were perfectly a 2, then the cumulative number of cases would follow an exponential growth pattern like this: 1, 2, 4, 16, 32, 64, 128, 256, 512, 1024…etc.
Obviously in the real world, values like that never would be that extreme. However, I think it is important to see that COVID-19 is exploding in the rest of the world much faster than it did in China. My hunch is that it is a function of the quarantine that was put into place. One way to find some support for that would be to look at the same kind of curve for South Korea and see if the regression would be more linear or exponential in nature.
Once again, I truncated the data at the beginning for values below 500 and again for the past three days of data since the number of new cases is leveling off. What I really want to know is the best model during the growth period. Once again, the linear model is a much better fit.
My conclusion is that quarantine (the extreme end of social distancing) seems to work well, which isn’t a surprise. It should serve though as a warning that we should be taking much more aggressive containment measures during this second wave of the pandemic than we currently are.