This practical session is intended to explore tumor growth data and interpret it using mathematical models. It is divided into three parts (only the first two parts will be covered in this session):
The data provided consists of measurements of tumor volumes from tumors implanted subcutaneously in the back of mice. The cells are from a murine lung cancer cell line (Lewis Lung Carcinoma). The volumes were computed from two one dimensional measures recorded using a caliper (the length $L$ and width $w$) and using the formula $V = \frac{1}{2}L\times w^2$. Volumes are given in mm$^3$ as a function of days following injection of the cells ($10^6$ cells $\simeq$ 1 mm$^3$ injected on day 0).
Load the data in a pandas dataframe and display it. You'll need to install and load the package readxl
beforehand
url_data = 'http://benzekry.perso.math.cnrs.fr/DONNEES/data_table.xlsx'
p = tempfile()
download.file(url_data, p, mode="wb")
df = read_excel(path = p)
df
Get the time vector. It is in days
Plot the growth of the first three mice.
Problem:
Considering $n$ volume observations $(V_1,\cdots,V_n)$ at time points $(t_1, \cdots, t_n)$, we would like to see whether these data could have been generated by a given function $$V: \begin{array}{ccc} \mathbb{R}\times \mathbb{R}^p & \longrightarrow & \mathbb{R} \\ (t,\theta) & \longmapsto & V(t,\theta)\end{array}$$ depending on time $t$ and a vector of parameters $\theta\in \mathbb{R}^p$.
Another – closely related – problem is to find a set of parameters $\hat{\theta}$ that would "best" describe the data.
In our context a model will be the combination of two parts:
The latter is defined by assuming that the observations were generated by the model plus some measurements error and we write $$ V_j = V(t_j; \theta) + \sigma_j \varepsilon_j. $$ where $\sigma_j\varepsilon_j$ describes the error term. It is often natural to assume that this error term is gaussian, with expectation zero. $\sigma_j$ is the standard deviation of the error and the $\varepsilon_j$ are independent and identically distributed random variables with $$ \varepsilon_j \sim \mathcal{N}(0, 1). $$
The Gompertz model is a popular model for tumor growth. It can be expressed by the following differential equation and analytical expression: $$ \left\lbrace\begin{array}{l} \frac{dV}{dt} = \left(\alpha_0 - \beta\ln\left(\frac{V}{V_c}\right)\right)V\\ V(t=0) = 1 \end{array}\right. \quad \Rightarrow \quad V(t) = V_c\left(\frac{V_I}{V_c}\right)^{e^{-\beta t}}e^{\frac{\alpha_0}{\beta}\left(1-e^{-\beta t}\right)} $$ where $V_c$ is the volume of one cell (a constant equal to $10^{-6}$), $\alpha_0$ is the proliferation rate at one cell and $\beta$ is the rate of exponential decrease of the relative growth rate. The initial condition has been fixed to the known number of injected cells (1 mm $^3$). One can show that the equation above is equivalent to $$ \left\lbrace\begin{array}{l} \frac{dV}{dt} = \alpha_1 e^{-\beta t}V\\ V(t=0) = 1 \end{array}\right. $$ with $\alpha_1=\alpha_0 +\beta \ln(V_c)$.
This model is implemented in the following function
gompertz_f = function(time, alpha0, beta){
Vc = 1e-6
VI = 1
V = Vc*(VI/Vc)**(exp(-beta*time))*exp(alpha0/beta*(1-exp(-beta*time)))
}
Plot the model over a time span from 0 to 20 days with parameter values $\alpha_0 = 0.4$ and $\beta=0.001$. Add the data points on the plot
There exists a function to perform nonlinear least-squares in R
, the function nls
. We can use this function to estimate the model parameters. We use a formula
, characterized by the ~
between the dependent variable (here the observations) and the independent variable (here, the time) through the model function.
y = df$'1'
nlm = nls(formula = y ~ gompertz_f(time, alpha0, beta), start=c(alpha0=0.4, beta=0.001))
summary(nlm)
The variance-covariance matrix of the estimated parameters is given using the function vcov()
vcov(nlm)
Get the estimated parameters using the function coef()
applied to nlm
Compute the relative standard errors on $\alpha_0$ and $\beta$
Compute the correlation coefficient between $\alpha_0$ and $\beta$ (hint: use cov2cor()
)
Plot the ellipsoidal confidence region for parameters estimates (see ellipse()
function from the mixtools
package)
What do you conclude about the identifiability of the model?
The power law model as a simple and biologically interpretable model
Although widely employed due to its excellent descriptive power, one of the main criticism addressed to the Gompertz model is its lack of biological ground. Indeed, while the parameter $\alpha_0$ can be interpreted from the duration of the cell cycle, the parameter $\beta$ remains heuristic. It is not a physiological parameter that could be experimentally measured and can only be assessed through fitting the growth curve.
The power law model is yet another model which consists in assuming that the proliferative tissue – instead of being a constant fraction of the tumor volume as in the exponential model – is rather proportional to the volume of the tumor elevated to a power $\gamma$. This power (or rather the triple of it) can be interpreted as the fractal (Hausdorff) dimension of the proliferative tissue. For example, when $\gamma=\frac{2}{3}$ then the proliferative tissue would only be two-dimensional within a three-dimensional tumor. This could correspond to a proliferative rim limited to the surface of the tumor, and would make sense because the vascularization of a tumor often occurs through its surface. However, an active process of tumor-driven vasculature development (the tumor neo-angiogenesis) induces the growth of new infiltrative blood vessels. From the naturall tree structure of the blood network, a fractal dimension naturally occurs and the proliferative tissue, being in the vicinity of the blood vessels, inherits this dimension. Summing up, this gives the following simple differential equation which can, once again, be solved analytically: $$ \left\lbrace\begin{array}{l} \frac{dV}{dt} = \alpha V^\gamma\\ V(t=0) = 1 \end{array} \right. \quad \Longrightarrow \quad V(t) = (1 + \alpha (1-\gamma) t)^{\frac{1}{1-\gamma}}. $$
In the case of $\gamma=\frac{2}{3}$, show that growth of the tumor radius is linear in time. This patterns is experimentally and clinically observed in many situations, including the growth of gliomas.
Write a R
function implementing this model
Use nls
to fit the model (initial guess $\alpha = 0.2, \; \gamma = 0.7$) to the data. What value do you estimate for the fractal dimension of the vasculature in this data set?
Perform identifiability analysis (relative standard errors, correlation coefficient, 95% confidence region)