Introduction¶

This practical session is intended to explore tumor growth data and interpret it using mathematical models. It is divided into three parts (only the first two parts will be covered in this session):

Analysis of the data by basic plots
Fitting and comparing tumor growth models to the data in order to understand methods of single-individual parameter estimation (nonlinear regression). Use these tools to understand tumor growth laws
Using the model(s) to predict future tumor growth with only a limited number of initial data points

The data provided consists of measurements of tumor volumes from tumors implanted subcutaneously in the back of mice. The cells are from a murine lung cancer cell line (Lewis Lung Carcinoma). The volumes were computed from two one dimensional measures recorded using a caliper (the length $L$ and width $w$) and using the formula $V = \frac{1}{2}L\times w^2$. Volumes are given in mm$^3$ as a function of days following injection of the cells ($10^6$ cells $\simeq$ 1 mm$^3$ injected on day 0).

Data exploration¶

Load the data in a pandas dataframe and display it. You'll need to install and load the package readxl beforehand

url_data = 'http://benzekry.perso.math.cnrs.fr/DONNEES/data_table.xlsx'
p        = tempfile()
download.file(url_data, p, mode="wb")
df        = read_excel(path = p)

df

Get the time vector. It is in days

Plot the growth of the first three mice.

Parameter estimation¶

Problem:

Considering $n$ volume observations $(V_1,\cdots,V_n)$ at time points $(t_1, \cdots, t_n)$, we would like to see whether these data could have been generated by a given function $$V: \begin{array}{ccc} \mathbb{R}\times \mathbb{R}^p & \longrightarrow & \mathbb{R} \\ (t,\theta) & \longmapsto & V(t,\theta)\end{array}$$ depending on time $t$ and a vector of parameters $\theta\in \mathbb{R}^p$.

Another – closely related – problem is to find a set of parameters $\hat{\theta}$ that would "best" describe the data.

In our context a model will be the combination of two parts:

The structural model $V$ (deterministic)
An observation model linking the model to the observations (error model, stochastic)

The latter is defined by assuming that the observations were generated by the model plus some measurements error and we write $$ V_j = V(t_j; \theta) + \sigma_j \varepsilon_j. $$ where $\sigma_j\varepsilon_j$ describes the error term. It is often natural to assume that this error term is gaussian, with expectation zero. $\sigma_j$ is the standard deviation of the error and the $\varepsilon_j$ are independent and identically distributed random variables with $$ \varepsilon_j \sim \mathcal{N}(0, 1). $$

Nonlinear least-squares¶

The Gompertz model is a popular model for tumor growth. It can be expressed by the following differential equation and analytical expression: $$ \left\lbrace\begin{array}{l} \frac{dV}{dt} = \left(\alpha_0 - \beta\ln\left(\frac{V}{V_c}\right)\right)V\\ V(t=0) = 1 \end{array}\right. \quad \Rightarrow \quad V(t) = V_c\left(\frac{V_I}{V_c}\right)^{e^{-\beta t}}e^{\frac{\alpha_0}{\beta}\left(1-e^{-\beta t}\right)} $$ where $V_c$ is the volume of one cell (a constant equal to $10^{-6}$), $\alpha_0$ is the proliferation rate at one cell and $\beta$ is the rate of exponential decrease of the relative growth rate. The initial condition has been fixed to the known number of injected cells (1 mm $^3$). One can show that the equation above is equivalent to $$ \left\lbrace\begin{array}{l} \frac{dV}{dt} = \alpha_1 e^{-\beta t}V\\ V(t=0) = 1 \end{array}\right. $$ with $\alpha_1=\alpha_0 +\beta \ln(V_c)$.

This model is implemented in the following function

gompertz_f = function(time, alpha0, beta){
    Vc = 1e-6
    VI = 1
    V  = Vc*(VI/Vc)**(exp(-beta*time))*exp(alpha0/beta*(1-exp(-beta*time)))
}

Plot the model over a time span from 0 to 20 days with parameter values $\alpha_0 = 0.4$ and $\beta=0.001$. Add the data points on the plot

There exists a function to perform nonlinear least-squares in R, the function nls. We can use this function to estimate the model parameters. We use a formula, characterized by the ~ between the dependent variable (here the observations) and the independent variable (here, the time) through the model function.

y = df$'1'
nlm = nls(formula = y ~ gompertz_f(time, alpha0, beta), start=c(alpha0=0.4, beta=0.001))
summary(nlm)

Formula: y ~ gompertz_f(time, alpha0, beta)

Parameters:
       Estimate Std. Error t value Pr(>|t|)    
alpha0 1.972762   0.081245   24.28 8.83e-09 ***
beta   0.086642   0.004206   20.60 3.23e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 49.79 on 8 degrees of freedom

Number of iterations to convergence: 7 
Achieved convergence tolerance: 2.263e-06
  (5 observations deleted due to missingness)

The variance-covariance matrix of the estimated parameters is given using the function vcov()

vcov(nlm)

Get the estimated parameters using the function coef() applied to nlm

Compute the relative standard errors on $\alpha_0$ and $\beta$

Compute the correlation coefficient between $\alpha_0$ and $\beta$ (hint: use cov2cor())

Plot the ellipsoidal confidence region for parameters estimates (see ellipse() function from the mixtools package)

What do you conclude about the identifiability of the model?

The power law model as a simple and biologically interpretable model
Although widely employed due to its excellent descriptive power, one of the main criticism addressed to the Gompertz model is its lack of biological ground. Indeed, while the parameter $\alpha_0$ can be interpreted from the duration of the cell cycle, the parameter $\beta$ remains heuristic. It is not a physiological parameter that could be experimentally measured and can only be assessed through fitting the growth curve.

The power law model is yet another model which consists in assuming that the proliferative tissue – instead of being a constant fraction of the tumor volume as in the exponential model – is rather proportional to the volume of the tumor elevated to a power $\gamma$. This power (or rather the triple of it) can be interpreted as the fractal (Hausdorff) dimension of the proliferative tissue. For example, when $\gamma=\frac{2}{3}$ then the proliferative tissue would only be two-dimensional within a three-dimensional tumor. This could correspond to a proliferative rim limited to the surface of the tumor, and would make sense because the vascularization of a tumor often occurs through its surface. However, an active process of tumor-driven vasculature development (the tumor neo-angiogenesis) induces the growth of new infiltrative blood vessels. From the naturall tree structure of the blood network, a fractal dimension naturally occurs and the proliferative tissue, being in the vicinity of the blood vessels, inherits this dimension. Summing up, this gives the following simple differential equation which can, once again, be solved analytically: $$ \left\lbrace\begin{array}{l} \frac{dV}{dt} = \alpha V^\gamma\\ V(t=0) = 1 \end{array} \right. \quad \Longrightarrow \quad V(t) = (1 + \alpha (1-\gamma) t)^{\frac{1}{1-\gamma}}. $$

In the case of $\gamma=\frac{2}{3}$, show that growth of the tumor radius is linear in time. This patterns is experimentally and clinically observed in many situations, including the growth of gliomas.

Write a R function implementing this model

Use nls to fit the model (initial guess $\alpha = 0.2, \; \gamma = 0.7$) to the data. What value do you estimate for the fractal dimension of the vasculature in this data set?

Perform identifiability analysis (relative standard errors, correlation coefficient, 95% confidence region)

Time	1	2	3	4	5	6	7	8	9	10
<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
5	NA	54.88638	NA	NA	68.80902	NA	NA	NA	NA	NA
6	NA	59.40804	89.2313	55.34859	59.85329	46.50239	94.72797	NA	18.56733	26.88002
7	50.0218	85.66228	157.3515	56.17581	58.98850	55.15519	126.09418	NA	73.29369	NA
11	309.5195	261.57277	221.6986	103.33091	179.66476	251.22221	333.13662	201.1653	126.82089	144.15130
12	324.8959	412.26565	327.4923	155.32034	309.78765	341.17544	538.68072	316.3830	222.66116	193.40192
13	450.8421	488.45074	461.4380	167.95867	470.57132	438.33539	669.92962	338.2404	244.72691	281.17123
14	572.4508	618.85479	641.4450	219.37869	480.93031	859.95276	762.52762	411.9588	333.62984	294.88621
15	664.3366	798.99721	746.8684	412.37838	488.77798	854.72795	923.71765	586.6670	367.47527	391.88414
18	1151.6938	1218.72106	1359.4584	584.19221	1021.72105	1143.27950	NA	991.8820	805.77885	744.95487
19	1338.3833	1415.47186	1626.8744	762.07402	1278.53078	1645.40682	NA	1219.8999	1030.03428	990.33179
20	1522.8078	1410.14921	2063.9125	880.32572	1377.38184	1950.48269	NA	1833.0966	1272.81888	1085.31491
21	1897.0737	1524.12658	NA	1040.07443	1538.92205	NA	NA	2131.6057	1555.35908	1331.18967
22	NA	1935.41534	NA	1335.13646	1958.56025	NA	NA	NA	1671.14852	1641.33392
24	NA	NA	NA	1850.44733	NA	NA	NA	NA	NA	1992.06746
25	NA	NA	NA	2079.44517	NA	NA	NA	NA	NA	NA

A matrix: 2 × 2 of type dbl
	alpha0	beta
alpha0	0.0066007057	0.0003416397
beta	0.0003416397	0.0000176901

Table of Contents

Introduction¶

Data exploration¶

Parameter estimation¶

Nonlinear least-squares¶