40. Estimation of Corona cases with Python and Pandas
By Bernd Klein. Last modified: 26 Apr 2023.
What is the real number of infected peoply by the corona virus. What we learn in the news is just the number of positively tested people. Unfortunately, it is technically not possible to test everybody with possible corona virus symptoms. For example we learned that on the 18th of March we had 12327 known cases in Germany. An extremely important question is "How many unknown cases do we have?".
This is what this page is about. I try to estimate this number or better to calculate some possible boundaries for this number.
I started doing this project for myself. I didn't intend to publish it. Everyday the new data on the corona virus cases rains down on me by radio and TV. Day by day and in many cases only comparisons with the previous day. So I started collecting the data on a day to day basis.
But I am sure that I made you curious. You want to know now this "dark" number, the number of total cases, known plus unknown? What we hear about the severity of the disease, i.e. the death rate is also just a calculation on the known cases. So we have the division of the number of people who died due to the virus and the number of the known cases. The death rate given and based on known cases varies between 2.5 % and 5 %. The higher the number of unknown cases is the smaller the real death rate will be. I assumed that the "real" death rate might be 0.7, like a seasonal flu. Be aware: This is just an assumption, not covered by any scientific data, but anyway quite likely. I further assume that the avarage duration of an infection is about 18 days, the total number of infections in Germany on the 18th of March is about 280000 cases compared to 12327 cases. This means about 23 times higher than the known figure!
If you like to follow my calculations based on Python and Pandas, you can continue reading here:
import pandas as pd
xf = pd.ExcelFile("/data1/coronacases.xlsx")
df = xf.parse("Sheet1",
skiprows=2,
index_col=0,
names=["dates", "cases", "active", "deaths"])
df[["cases", "active", "deaths"]].plot()
OUTPUT:
The number of cases we here on the media corresponds to the accumulated day by day newly infected people. The number of "active" cases is the number without those who are healed or unfortunately died. The following number shows the ratio of those. As long as nobody had died, the death ratio was 0 %.
discharged = df["cases"] - df["active"]
if "healed" in df.columns:
df.drop("healed", axis=1, inplace=True)
if "healed" not in df.columns:
df.insert(loc=len(df.columns),
column="healed",
value=(discharged -df["deaths"]))
healed_ratio = df["healed"] * 100 / discharged
healed_ratio
death_ratio = df["deaths"] * 100 / discharged
death_ratio
outcome = {"death_ratio": death_ratio, "healed_ratio": healed_ratio}
outcome_df = pd.DataFrame(outcome)
outcome_df
outcome_df.plot()
OUTPUT:
In the following code, we do some curse fitting. We fit the value to the function
$$f = a \cdot e^{b \cdot x} + c$$
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def growth_func(x, a, b, c):
return a * np.exp(b * x) + c
Y = df.cases.values
X = np.arange(0, len(Y))
popt, pcov = curve_fit(growth_func, X, Y)
def growth(x):
return growth_func(x, *popt)
plt.figure()
plt.plot(X, Y, 'ko', label="Original Data")
plt.plot(X, growth(X), 'r-', label="Fitted Curve")
plt.legend()
plt.show()
#for i in range(1, 30):
# print(df.cases.values[i], growth(i))
In the following code, I make some crucial assumption. Both by rule of thumb and not properly based on scientific data, because it is not available. The average duration of an infection 'days_infected_before_outcome' might be less or higher. Furthermore I made a deliberate decision to assume the real death rate, based on known and unknown cases, is 0.7. This number corresponds to the seasonal flu.
days_infected_before_outcome = 14
assumed_real_death_rate = 2.9
def create_inverse_growth_func(a, b, c):
def inverse(x):
return np.log((x - c) / a) / b
return inverse
inverse_growth = create_inverse_growth_func(*popt)
# number of cases 'days_infected_before_outcome':
cases_days_infection_before = df["deaths"][-1] * 100 / assumed_real_death_rate
shift_days = inverse_growth(cases_days_infection_before)
shift_days -= (len(df) - days_infected_before_outcome)
print(shift_days)
OUTPUT:
5.765157139549551
plt.figure()
X = np.arange(0, 65)
plt.plot(X,
growth_func(X + days_infected_before_outcome, *popt),
'r-',
label="Fitted Curve")
plt.legend()
plt.show()
print(len(df["cases"]))
x = np.arange(0, len(df["cases"]))
print(len(x))
if "real" in df.columns:
df.drop("real", axis=1, inplace=True)
df.insert(loc=len(df.columns),
column="real",
value=growth(x+shift_days).astype(np.int))
df
OUTPUT:
cases | active | deaths | healed | real | |
---|---|---|---|---|---|
dates | |||||
2020-02-17 | 16 | 9 | 0 | 7 | 3659 |
2020-02-18 | 16 | 7 | 0 | 9 | 3659 |
2020-02-19 | 16 | 7 | 0 | 9 | 3659 |
2020-02-20 | 16 | 3 | 0 | 13 | 3659 |
2020-02-21 | 16 | 2 | 0 | 14 | 3659 |
2020-02-22 | 16 | 2 | 0 | 14 | 3659 |
2020-02-23 | 16 | 2 | 0 | 14 | 3659 |
2020-02-24 | 16 | 2 | 0 | 14 | 3659 |
2020-02-25 | 18 | 3 | 0 | 15 | 3659 |
2020-02-26 | 26 | 11 | 0 | 15 | 3659 |
2020-02-27 | 18 | 32 | 0 | -14 | 3659 |
2020-02-28 | 74 | 58 | 0 | 16 | 3659 |
2020-02-29 | 79 | 63 | 0 | 16 | 3659 |
2020-03-01 | 130 | 114 | 0 | 16 | 3659 |
2020-03-02 | 165 | 149 | 0 | 16 | 3659 |
2020-03-03 | 203 | 187 | 0 | 16 | 3659 |
2020-03-04 | 262 | 246 | 0 | 16 | 3659 |
2020-03-05 | 545 | 528 | 0 | 17 | 3659 |
2020-03-06 | 670 | 652 | 0 | 18 | 3659 |
2020-03-07 | 800 | 782 | 0 | 18 | 3659 |
2020-03-08 | 1040 | 1022 | 0 | 18 | 3659 |
2020-03-09 | 1224 | 1204 | 2 | 18 | 3659 |
2020-03-10 | 1565 | 1545 | 2 | 18 | 3659 |
2020-03-11 | 1966 | 1938 | 3 | 25 | 3659 |
2020-03-12 | 2745 | 2714 | 6 | 25 | 3659 |
2020-03-13 | 3675 | 3621 | 8 | 46 | 3660 |
2020-03-14 | 4599 | 4544 | 8 | 47 | 3662 |
2020-03-15 | 5813 | 5754 | 13 | 46 | 3666 |
2020-03-16 | 7272 | 7188 | 17 | 67 | 3679 |
2020-03-17 | 9367 | 9274 | 26 | 67 | 3713 |
2020-03-18 | 12327 | 12194 | 28 | 105 | 3805 |
2020-03-19 | 15320 | 15161 | 44 | 115 | 4057 |
2020-03-20 | 19848 | 19668 | 59 | 121 | 4741 |
2020-03-21 | 22364 | 22071 | 84 | 209 | 6600 |
2020-03-22 | 24873 | 24513 | 94 | 266 | 11653 |
2020-03-23 | 29056 | 28480 | 123 | 453 | 25388 |
2020-03-24 | 32986 | 29586 | 157 | 3243 | 62724 |
2020-03-25 | 37323 | 33570 | 206 | 3547 | 164213 |
2020-03-26 | 43938 | 37998 | 267 | 5673 | 440091 |
2020-03-27 | 50871 | 43862 | 351 | 6658 | 1190003 |
2020-03-28 | 57695 | 48781 | 433 | 8481 | 3228476 |
2020-03-29 | 62095 | 52359 | 525 | 9211 | 8769619 |
2020-03-30 | 66711 | 52566 | 645 | 13500 | 23832009 |
2020-03-31 | 70985 | 54479 | 682 | 15824 | 64775829 |
2020-04-01 | 77981 | 58350 | 931 | 18700 | 176072669 |
2020-04-02 | 84794 | 61247 | 1107 | 22440 | 478608844 |
2020-04-03 | 91159 | 65309 | 1275 | 24575 | 1300987426 |
2020-04-04 | 96108 | 68262 | 1446 | 26400 | 3536444161 |
2020-04-05 | 100123 | 69839 | 1584 | 28700 | 9613045529 |
2020-04-06 | 103375 | 72865 | 1819 | 28691 | 26130960462 |
df[["cases", "real"]].plot()
OUTPUT:
Live Python training
Enjoying this page? We offer live Python training courses covering the content of this site.
Upcoming online Courses