Introduction

This practical session is intended to explore tumor growth data and interpret it using mathematical models. It is divided into three parts (only the first two parts will be covered in this session):

  1. Analysis of the data by basic plots
  2. Fitting and comparing tumor growth models to the data in order to understand methods of single-individual parameter estimation (nonlinear regression). Use these tools to understand tumor growth laws
  3. Using the model(s) to predict future tumor growth with only a limited number of initial data points

The data provided consists of measurements of tumor volumes from tumors implanted subcutaneously in the back of mice. The cells are from a murine lung cancer cell line (Lewis Lung Carcinoma). The volumes were computed from two one dimensional measures recorded using a caliper (the length $L$ and width $w$) and using the formula $V = \frac{1}{2}L\times w^2$. Volumes are given in mm$^3$ as a function of days following injection of the cells ($10^6$ cells $\simeq$ 1 mm$^3$ injected on day 0).

Data exploration

Load the data in a pandas dataframe and display it. You'll need to install and load the package readxl beforehand

In [11]:
url_data = 'http://benzekry.perso.math.cnrs.fr/DONNEES/data_table.xlsx'
p        = tempfile()
download.file(url_data, p, mode="wb")
df        = read_excel(path = p)
In [12]:
df
A tibble: 15 × 11
Time12345678910
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
5 NA 54.88638 NA NA 68.80902 NA NA NA NA NA
6 NA 59.40804 89.2313 55.34859 59.85329 46.50239 94.72797 NA 18.56733 26.88002
7 50.0218 85.66228 157.3515 56.17581 58.98850 55.15519126.09418 NA 73.29369 NA
11 309.5195 261.57277 221.6986 103.33091 179.66476 251.22221333.13662 201.1653 126.82089 144.15130
12 324.8959 412.26565 327.4923 155.32034 309.78765 341.17544538.68072 316.3830 222.66116 193.40192
13 450.8421 488.45074 461.4380 167.95867 470.57132 438.33539669.92962 338.2404 244.72691 281.17123
14 572.4508 618.85479 641.4450 219.37869 480.93031 859.95276762.52762 411.9588 333.62984 294.88621
15 664.3366 798.99721 746.8684 412.37838 488.77798 854.72795923.71765 586.6670 367.47527 391.88414
181151.69381218.721061359.4584 584.192211021.721051143.27950 NA 991.8820 805.77885 744.95487
191338.38331415.471861626.8744 762.074021278.530781645.40682 NA1219.89991030.03428 990.33179
201522.80781410.149212063.9125 880.325721377.381841950.48269 NA1833.09661272.818881085.31491
211897.07371524.12658 NA1040.074431538.92205 NA NA2131.60571555.359081331.18967
22 NA1935.41534 NA1335.136461958.56025 NA NA NA1671.148521641.33392
24 NA NA NA1850.44733 NA NA NA NA NA1992.06746
25 NA NA NA2079.44517 NA NA NA NA NA NA

Get the time vector. It is in days

Plot the growth of the first three mice.

Parameter estimation

Problem:

Considering $n$ volume observations $(V_1,\cdots,V_n)$ at time points $(t_1, \cdots, t_n)$, we would like to see whether these data could have been generated by a given function $$V: \begin{array}{ccc} \mathbb{R}\times \mathbb{R}^p & \longrightarrow & \mathbb{R} \\ (t,\theta) & \longmapsto & V(t,\theta)\end{array}$$ depending on time $t$ and a vector of parameters $\theta\in \mathbb{R}^p$.

Another – closely related – problem is to find a set of parameters $\hat{\theta}$ that would "best" describe the data.

In our context a model will be the combination of two parts:

  1. The structural model $V$ (deterministic)
  2. An observation model linking the model to the observations (error model, stochastic)

The latter is defined by assuming that the observations were generated by the model plus some measurements error and we write $$ V_j = V(t_j; \theta) + \sigma_j \varepsilon_j. $$ where $\sigma_j\varepsilon_j$ describes the error term. It is often natural to assume that this error term is gaussian, with expectation zero. $\sigma_j$ is the standard deviation of the error and the $\varepsilon_j$ are independent and identically distributed random variables with $$ \varepsilon_j \sim \mathcal{N}(0, 1). $$

Nonlinear least-squares

The Gompertz model is a popular model for tumor growth. It can be expressed by the following differential equation and analytical expression: $$ \left\lbrace\begin{array}{l} \frac{dV}{dt} = \left(\alpha_0 - \beta\ln\left(\frac{V}{V_c}\right)\right)V\\ V(t=0) = 1 \end{array}\right. \quad \Rightarrow \quad V(t) = V_c\left(\frac{V_I}{V_c}\right)^{e^{-\beta t}}e^{\frac{\alpha_0}{\beta}\left(1-e^{-\beta t}\right)} $$ where $V_c$ is the volume of one cell (a constant equal to $10^{-6}$), $\alpha_0$ is the proliferation rate at one cell and $\beta$ is the rate of exponential decrease of the relative growth rate. The initial condition has been fixed to the known number of injected cells (1 mm $^3$). One can show that the equation above is equivalent to $$ \left\lbrace\begin{array}{l} \frac{dV}{dt} = \alpha_1 e^{-\beta t}V\\ V(t=0) = 1 \end{array}\right. $$ with $\alpha_1=\alpha_0 +\beta \ln(V_c)$.

This model is implemented in the following function

In [43]:
gompertz_f = function(time, alpha0, beta){
    Vc = 1e-6
    VI = 1
    V  = Vc*(VI/Vc)**(exp(-beta*time))*exp(alpha0/beta*(1-exp(-beta*time)))
}

Plot the model over a time span from 0 to 20 days with parameter values $\alpha_0 = 0.4$ and $\beta=0.001$. Add the data points on the plot

There exists a function to perform nonlinear least-squares in R, the function nls. We can use this function to estimate the model parameters. We use a formula, characterized by the ~ between the dependent variable (here the observations) and the independent variable (here, the time) through the model function.

In [46]:
y = df$'1'
nlm = nls(formula = y ~ gompertz_f(time, alpha0, beta), start=c(alpha0=0.4, beta=0.001))
summary(nlm)
Formula: y ~ gompertz_f(time, alpha0, beta)

Parameters:
       Estimate Std. Error t value Pr(>|t|)    
alpha0 1.972762   0.081245   24.28 8.83e-09 ***
beta   0.086642   0.004206   20.60 3.23e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 49.79 on 8 degrees of freedom

Number of iterations to convergence: 7 
Achieved convergence tolerance: 2.263e-06
  (5 observations deleted due to missingness)

The variance-covariance matrix of the estimated parameters is given using the function vcov()

In [47]:
vcov(nlm)
A matrix: 2 × 2 of type dbl
alpha0beta
alpha00.00660070570.0003416397
beta0.00034163970.0000176901

Get the estimated parameters using the function coef() applied to nlm

Compute the relative standard errors on $\alpha_0$ and $\beta$

Compute the correlation coefficient between $\alpha_0$ and $\beta$ (hint: use cov2cor())

Plot the ellipsoidal confidence region for parameters estimates (see ellipse() function from the mixtools package)

What do you conclude about the identifiability of the model?

The power law model as a simple and biologically interpretable model
Although widely employed due to its excellent descriptive power, one of the main criticism addressed to the Gompertz model is its lack of biological ground. Indeed, while the parameter $\alpha_0$ can be interpreted from the duration of the cell cycle, the parameter $\beta$ remains heuristic. It is not a physiological parameter that could be experimentally measured and can only be assessed through fitting the growth curve.

The power law model is yet another model which consists in assuming that the proliferative tissue – instead of being a constant fraction of the tumor volume as in the exponential model – is rather proportional to the volume of the tumor elevated to a power $\gamma$. This power (or rather the triple of it) can be interpreted as the fractal (Hausdorff) dimension of the proliferative tissue. For example, when $\gamma=\frac{2}{3}$ then the proliferative tissue would only be two-dimensional within a three-dimensional tumor. This could correspond to a proliferative rim limited to the surface of the tumor, and would make sense because the vascularization of a tumor often occurs through its surface. However, an active process of tumor-driven vasculature development (the tumor neo-angiogenesis) induces the growth of new infiltrative blood vessels. From the naturall tree structure of the blood network, a fractal dimension naturally occurs and the proliferative tissue, being in the vicinity of the blood vessels, inherits this dimension. Summing up, this gives the following simple differential equation which can, once again, be solved analytically: $$ \left\lbrace\begin{array}{l} \frac{dV}{dt} = \alpha V^\gamma\\ V(t=0) = 1 \end{array} \right. \quad \Longrightarrow \quad V(t) = (1 + \alpha (1-\gamma) t)^{\frac{1}{1-\gamma}}. $$

In the case of $\gamma=\frac{2}{3}$, show that growth of the tumor radius is linear in time. This patterns is experimentally and clinically observed in many situations, including the growth of gliomas.

Write a R function implementing this model

Use nls to fit the model (initial guess $\alpha = 0.2, \; \gamma = 0.7$) to the data. What value do you estimate for the fractal dimension of the vasculature in this data set?

Perform identifiability analysis (relative standard errors, correlation coefficient, 95% confidence region)

In [ ]: