Infectious-Disease-Modeling/main.tex

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{amsmath}
\usepackage{graphicx}
\usepackage{float}
\usepackage{xcolor}
\usepackage{listings}
\usepackage{xparse}

\NewDocumentCommand{\codeword}{v}{%
\texttt{\textcolor{blue}{#1}}%
}

\lstset{language=C,keywordstyle={\bfseries \color{blue}}}
\graphicspath{ {./} }

\setlength{\parskip}{1em}
\setlength{\parindent}{0pt}

\begin{document}

\title{Modeling Infectious Diseases}

\author{Anthony Wang and Richie Jiang}

\maketitle

\section{Introduction}

Throughout human history, the spread of infectious diseases has led to destructive epidemics with lasting repercussions on society. With increasing population growth and the growing interconnectedness of the world, minimizing the damage of diseases has become far more important. Global health organizations such as the WHO work to prevent health emergencies, and when needed, prepare for them ([1]). In order to classify whether a disease poses a major risk, it is necessary to attempt to predict its behavior. Through the application of mathematical models and with the appropriate data, this can be done with widely varying approximate accuracy. It is often the case that accurate and sound models can only be created after sufficient data has been collected.

A compartment model is a mathematical model that simulates interactions between each compartment ([2]). Each compartment represents a different state for the objects being studied. These models can be used to predict how energy or materials are transmitted in a system. In the case of infectious diseases, they can be used to study how the diseases themselves are transmitted though a population.

Infectious diseases are illnesses caused by pathogens invading the body. These pathogens can vary from viruses to bacteria to arthropods ([3]). When a disease is highly contagious and inadequate measures are in place to stop the spread, it may begin to spread rapidly and cause an epidemic. The amount of cases and rate of spread varies due to the type of infection, but all epidemics are public health threats. If the disease is allowed to spread far enough, it may develop into a pandemic. Examples of pandemics include smallpox, tuberculosis, and the Spanish flu. One of the most famous pandemics, the Black Death, had a death toll of over 75 million. To avoid public health emergencies, diseases must be monitored for signs of mass infection, and appropriate measures must then be taken.

Attempts to mathematically model infectious diseases first began in the 1760s by Bernoulli ([4]). He modeled the impact of smallpox vaccination on average life expectancy in order to persuade governments to support vaccinating against smallpox. In 1864, Cato Maximilian Guldberg and Peter Waage proposed the law of mass action in chemistry, which states the rate of a chemical reaction is proportional to the concentrations of the two reactants ([5]). This law forms the basis of compartmental modeling, and it can similarly be applied to modeling diseases. Then, in 1927, William Ogilvy Kermack and A. G. McKendrick published their landmark paper on the SIR model ([6]). This model was the first compartment model used for infectious diseases, and even now, nearly a century later, it is still used to predict the spread of diseases ([7]).

In the classical SIR model, the population is divided into three compartments: susceptible, infectious, and recovered people. Each of the compartments can be represented by a mathematical function. A set of equations describes how the size of the compartments changes over time. This model assumes that once a person has recovered from the disease, they cannot be infected again. This model also assumes that every person has the same traits and characteristics, so they can be cleanly divided into compartments. However, as the behavior of diseases varies in the real world, multiple variations on the SIR model have been created.

The SIR model can be modified to include factors that affect disease spread and changes in population, such as birth and death rate. Over time, a population’s size will change due to births and deaths, both naturally and as a result of a disease. This can be represented through the addition of another parameter to the compartment equations. Another useful modification is the inclusion of vaccination, which adds a new compartment to the model. The inclusion of other compartments can help model other possible states that an individual could be in. Some variations on the classical model include:

\textbf{Extended SIR (ESIR) Model} \\
This model was also developed by Kermack and McKendrick ([8]). This model adds births and deaths, in the form of an inflow of new susceptibles and an outflow of deaths from the three compartments.

\textbf{SEIR Model} \\
This was yet another model developed by Kermack and McKendrick ([9]). In this model, another compartment is added---exposed---which consists of individuals that are infected but are not infectious yet.

\textbf{SIS / SEIS Model} \\
This model is similar to the SEIR model. However, there is no immunity upon recovering from the disease, so recovered individuals return to the susceptible compartment.

\textbf{SIRC Model} \\
This model adds the carrier compartment. The carrier compartment consists of individuals who have recovered from the disease but are still able to transmit it while suffering no symptoms.

\textbf{MSIR Model} \\
This model adds the maternally-derived immunity compartment. When babies are born, they may be immune to certain diseases for several months due to maternal antibodies.

In this paper, we will cover the mathematics behind linear and non-linear compartment models. We will also discuss the solution functions and approximations needed. We will then take data from recent diseases and model each using the different compartment models. In this paper we will only use the linear, SIR, Extended SIR, and SEIR models. Finally, we will examine the drawbacks and benefits of each model and their accuracy in modeling real diseases.


\section{Mathematical Explanation}

Linear compartment models are used to study biological systems, for example, the distribution of a drug administered orally throughout the human body ([2]). This can be represented through a three-compartment model, which contains three compartments: the concentrations of the drug in the gut, the drug in the blood, and the drug in the urine.

The model can be described by three functions:
\\ $P_0(t)$: percent of the drug in the gut
\\ $P_1(t)$: percent of the drug in the blood
\\ $P_2(t)$: percent of the drug in the urine

The parameters used are
\\ $k_1$: rate of gut-to-blood transfer
\\ $k_2$: rate of blood-to-urine transfer

It has been assumed that a single oral dose of the drug has been administered, with $100\%$ of it located at the absorption site (the gut).

The equations are
\begin{align}
P_0'(t) &= -k_1P_0(t) \\
P_1'(t) &= k_1P_0(t)-k_2P_1(t) \\
P_2'(t) &= k_2P_1(t)
\end{align}

We will now explain how these equations have been obtained from the functions and parameters.

The first equation describes the change in the percentage of drug in the gut. The only source of change is the transfer of the drug into the blood. This can be described by the parameter $k_1$. Multiplying this by the percent of drug in the gut, we get $k_lP_0(t)$. Because the amount of drug is decreasing, we add a negative sign and we obtain the equation $P_0'(t) = -k_lP_0(t)$.

The next equation we will look at is the one for the rate of blood-to-urine transfer. The only source of change is the transfer of the drug from the blood into the urine. This can be described by the parameter $k_2$. When we multiply this by the percent of drug in the blood, we get the equation $P_2'(t) = k_2P_1(t)$.

The next equation describes the change in the percentage of drug in the blood. The gut-to-blood transfer can be represented by $k_lP_0(t)$, and the blood-to-urine transfer can be represented by $k_2P_1(t)$. The gut-to-blood transfer results in gain in the percentage of drug so it will be added. As blood-to-urine transfer results in loss of the drug, we must subtract this from the gut-to-blood transfer. Doing so, we obtain the equation $P_1'(t) = k_lP_0(t) - k_2P_1(t)$.

This compartment model is linear because the equations are all linear functions of the concentrations. The SIR model is a nonlinear compartment model, however. It contains three compartments: susceptible (S), infected (I), and recovered (R).

The model can be described by three functions:
\\ S(t): fraction of population that is susceptible
\\ I(t): fraction of population that is infected
\\ R(t): fraction of population that is recovered

The parameters used are
\\ $\beta > 0$: infection rate
\\ $\gamma > 0$: removal rate
\\ $\rho > 0$: basic reproduction ratio

The basic reproduction ratio is the number of secondary infections produced by one infection in a population. It can be defined as

$$\rho = \frac{\beta}{\gamma}$$

When $\rho > 1$, an epidemic occurs. $\frac{1}{\gamma}$, or $\frac{\rho}{\beta}$ is the average infectious period.

The differential equations are
\begin{align}
S'(t) &= -\beta S(t)I(t) \\
I'(t) &= \beta S(t)I(t) - \gamma I(t) \\
R'(t) &= \gamma I(t)
\end{align}

When these equations are set to zero, solving for their equilibrium solutions yields an equilibrium when $I(t) = 0$.

We will now explain how these equations have been obtained from the functions and parameters.

We will first look at the equation for the change in the susceptible population, $S'(t) = -\beta S(t)I(t)$. For the classical SIR model, we are assuming that there are no individuals added to the susceptible population. The only change in the susceptible population is removal due to infection. If we look at the amount of infections an infected person can spread, we find that it can be represented by $S(t)$. This is because one individual will infect at a rate $\beta$, and $S(t)$ represents the fraction of the population that will be susceptible to infection. Because this is one individual, $S(t)$ must be multiplied by $I(t)$ to find the total amount of infections of susceptible people. As the susceptible population is decreasing due to infection, a negative sign is placed in front of the phrase, and we arrive at the equation $S'(t) = -\beta S(t)I(t)$.

The next equation we will examine is the equation for change in the recovered population, $R'(t) = \gamma I(t)$. The only source of change in the recovered population is from infected individuals recovering. This recovery rate is represented by the parameter $\gamma$. When we multiply this parameter by the infected population, we arrive at the equation $R'(t) = \gamma I(t)$.

The final equation we will examine is the equation for change in the infected population, $I'(t) = \beta S(t)I(t) - \gamma I(t)$. Infected people can both be added and removed from the infected population. We have already found the rate of new infected people ($\beta S(t)I(t)$) and the rate of new recovered individuals ($ \gamma I(t)$) in the two previous equations, so we just need to put them together. Making sure to subtract those who are recovered, we arrive at the equation $I'(t) = \beta S(t)I(t) - \gamma I(t)$ .

For these equations, finding the exact solution is very difficult, and becomes even more difficult once we add in new parameters or compartments. However, there are many methods to approximate the solution accurately. One approximation method we can use is Euler’s Method.

Euler’s Method is used to solve differential equations with given initial values. We are attempting to approximate the solution. We can do so through drawing a tangent line to the solution at different time intervals. If the tangent line is accurate enough, then the point we need will be approximately the same value as the solution at that time.

Assume we have a function $y' = f(t,y)$, $y(t_0) = y_0$. Then the equation for a tangent line to the solution at $t = t_0$ is $y = y_0 + f(t_0, y_0)(t - t_0)$.

To calculate the value of $y_1$, we can use this equation. We are given the value of $y_0$, and we can determine the value of $f(t_0, y_0)$, as it is the value of the derivative at $0$. Once we have these, we put $t_1$ into the equation and obtain the value of $y_1$. In order to calculate $y_2$, we must have the value of the solution at $t_1$. However, we do not know this. We can use $y_1$ instead, as it is an approximation of the solution at $t_1$. This allows us to create a tangent line through $(t_1, y_1)$ with slope $f(t_1, y_1)$. Once we plug in $t_2$, we obtain $y_2 = y_1 + f(t_1, y_1)(t_2 - t_1)$. We can continue through each time interval using the previous approximation of the solution.

Using the equation $y_{n+1} = y_n + f(t_n, y_n)(t_{n+1} - t_n)$, we can approximate the solution throughout the entire time interval.

This method can be applied to the SIR equations to get
\begin{align}
S_{n+1} &= S_n + (-\beta S_nI_n)(t_{n+1} - t_n) \\
I_{n+1} &= I_n + (\beta S_nI_n - \gamma I_n)(t_{n+1} - t_n) \\
R_{n+1} &= R_n + (\gamma I_n)(t_{n+1} - t_n)
\end{align}

To increase the accuracy of this method, smaller time intervals need to be used. As mentioned earlier, until data is fully collected, models cannot be created as accurately. For example, the values of the parameters $\beta$ and $\gamma$ may not be known until the end of the epidemic.


\section{Examples}

Now we will compare the linear, SIR, Extended SIR (ESIR), and SEIR models with data from actual recent diseases. First, we will model the 2003 outbreak of SARS in Hong Kong. To measure accuracy, we can compare the number of predicted infections per 10,000 people with the actual number of infections per 10,000 people according to the World Health Organization's official data set. However, the official data set is incomplete. For instance, it only starts tracking the number of recovered people several weeks after the start of the epidemic. So, the data that we used starts when the outbreak is beginning to slow down, not at the very beginning with the first case. We are also going to assume a susceptible population of 20000 people, because we lose numerical precision if we try to model a larger population. To calculate the parameters and fit the models to the actual data, we will use the \codeword{scipy.optimize.minimize} function in Python to find the optimal parameters that minimize the error ([10]). This function is appropriate for the data because the parameters are not well known, so we must try to fit our model to the data to find the parameters.

\begin{figure}[H]
\begin{center}\includegraphics[scale=0.27]{SARS-Linear}\end{center}
\caption{Predicted I: 946.1041288549815, Actual I: 877.5, Percent error: 7.8\%}
\end{figure}

In the linear model, the functions for the three compartments are all linear. However, for SARS, the actual number of cases is relatively linear, so the infected curve fits quite well. This reflects how the data set is very incomplete, as real epidemics are known to be highly nonlinear.

\begin{figure}[H]
\begin{center}\includegraphics[scale=0.27]{SARS-SIR}\end{center}
\caption{Predicted I: 1006.3748700651734, Actual I: 877.5, Percent error: 14.7\%}
\end{figure}

The SIR model is actually a bit worse than the linear model for SARS because of incomplete data.

\begin{figure}[H]
\begin{center}\includegraphics[scale=0.27]{SARS-ESIR}\end{center}
\caption{Predicted I: 934.6300578153925, Actual I: 877.5, Percent error: 6.5\%}
\end{figure}

By also including deaths, the Extended SIR model is better than both the linear and SIR models.

\begin{figure}[H]
\begin{center}\includegraphics[scale=0.27]{SARS-SEIR}\end{center}
\caption{Predicted I: 1009.5304659757634, Actual I: 877.5, Percent error: 15.0\%}
\end{figure}

The SEIR model also provides a good fit, although it is not as good as the other three models in this case, probably because of incomplete data.

We can also apply our models to the ongoing COVID-19 outbreak in the United States. We are going to assume a susceptible population of 3 million people.

\begin{figure}[H]
\begin{center}\includegraphics[scale=0.27]{COVID-19-Linear}\end{center}
\caption{Predicted I: 697.635036958848, Actual I: 1320.7366666666667, Percent error: 47.2\%}
\end{figure}

Again, the functions for the three compartments are all linear, but in this case, the actual outbreak started slow before March 22, 2020 and quickly accelerated, in contrast to the prediction from the model, a linear line. So, for COVID-19, the linear model provides a very poor fit to the actual data.

\begin{figure}[H]
\begin{center}\includegraphics[scale=0.27]{COVID-19-SIR}\end{center}
\caption{Predicted I: 1266.8203435457365, Actual I: 1320.7366666666667, Percent error: 4.1\%}
\end{figure}

The SIR model is a big improvement over the linear model for COVID-19.

\begin{figure}[H]
\begin{center}\includegraphics[scale=0.27]{COVID-19-ESIR}\end{center}
\caption{Predicted I: 1266.3066498710834, Actual I: 1320.7366666666667, Percent error: 4.1\%}
\end{figure}

The Extended SIR model also provides a close fit.

\begin{figure}[H]
\begin{center}\includegraphics[scale=0.27]{COVID-19-SEIR}\end{center}
\caption{Predicted I: 1267.3615656181303, Actual I: 1320.7366666666667, Percent error: 4.0\%}
\end{figure}

For COVID-19, the SEIR model is the best out of the four models.


\section{Conclusion}

For SARS in Hong Kong, the Extended SIR model provides the best fit. We are going to ignore the linear model because of incomplete data; the actual number of cases is relatively linear for SARS, which causes the linear model to appear more accurate. The other three models all have a very similar infected curve, but a key difference in their graphs is the number of susceptible and recovered people. In the SIR model, both of these curves are nearly linear. However, in the Extended SIR model, the susceptible curve is concave up while the recovered curve is concave down, which reflects how the outbreak slowed after the beginning. In the SEIR model, the susceptible curve is slightly concave down while the recovered curve is concave up, so it models the outbreak less accurately than the Extended SIR model. In conclusion, the Extended SIR model is the best for SARS because it most accurately models the other two compartments, susceptible and recovered.

However, the SEIR model is slightly better for COVID-19. This time, the linear model is very inaccurate. All three of the graphs of the SIR, Extended SIR, and SEIR models look extremely similar. Out of the three, the SEIR model most closely predicts the actual number of infections, but only by a very small margin. For COVID-19, there is not a significant advantage to using the SEIR model in terms of accuracy when all three models are similar.


\section{References}
[1] World Health Organization, "What we do - WHO | World Health Organization", https://www.who.int/about/what-we-do

[2] Buclin, T, "COMPARTMENTAL KINETICS - University of Lausanne", https://sepia.unil.ch/pharmacology/index.php?id=71

[3] World Health Organization, "WHO | Infectious diseases",
\\https://www.who.int/topics/infectious\_diseases/en/

[4] Hethcote, Herbert, "The Mathematics of Infectious Diseases", SIAM Review Vol. 42 No. 4, 600, 2000

[5] BYJUS, "Law of Mass Action", https://byjus.com/chemistry/law-of-mass-action-or-law-of-chemical-equilibrium/

[6] Kermack, W; McKendrick, A, "A Contribution to the Mathematical Theory of Epidemics.", Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character, 700, 1927

[7] Smith, D; Moore, L, "The SIR Model for Spread of Disease - Background: Hong Kong Flu", https://www.maa.org/press/periodicals/loci/joma/the-sir-model-for-spread-of-disease-background-hong-kong-flu

[8] Kermack, W; McKendrick, A, "Contributions to the mathematical theory of epidemics – II. The problem of endemicity.", Bulletin of Mathematical Biology Vol. 53, 57, 1991

[9] Kermack, W; McKendrick, A, "Contributions to the mathematical theory of epidemics – III. Further studies of the problem of endemicity.", Bulletin of Mathematical Biology Vol. 53, 89, 1991

[10] Wang, A, "Infectious-Disease-Modeling/solver2.py at master Ta180m/Infectious-Disease-Modeling", https://github.com/Ta180m/Infectious-Disease-Modeling/blob/
\\master/solver2.py


\end{document}