Load the tumor growth data set from the url http://benzekry.perso.math.cnrs.fr/DONNEES/data_exam.csv into a dataframe

In [24]:
df = read.csv('http://benzekry.perso.math.cnrs.fr/DONNEES/data_exam.csv', sep=";")


Load the time vector in a variable time

Load the volume data in a variable V

# Linear least-squares¶

We will first assume a constant error model (i.e. $\sigma_j=\sigma,\, \forall j$) and an exponential structural model: $$V\left(t; \left(V_0, \alpha \right)\right) = V_0 e^{\alpha t}.$$ We can transform the problem so that it reduces to a linear regression.

$$\ln(V_j) = \ln\left(V_0\right) + \alpha t_j + \sigma \varepsilon_j$$

Define a variable y as the log of V

Using the formula seen in class, build the least-squares matrix $M$ for fitting y

Solve the system corresponding to the linear regression

Plot the regression line together with the data

Considering that the number of injected cells is $10^6$ cells, which corresponds to $V_0 = 1$ mm$^3$, and looking at the fit, what do you conclude about the validity of the exponential model?

The estimate of $\sigma^2$ is given by $$s^2 = \frac{1}{n-2}\sum_{j=1}^n\left(y_j - M\hat{\theta}\right)^2$$ with $\hat{\theta}$ the vector of optimal parameters just found and $n$ is the number of time points.

If $$residuals = y-M\hat{\theta}$$ is the vector of residuals, then $s^2$ can be computed as $$s^2 = \frac{1}{n-2}residuals^T\cdot residuals$$ with $residuals^T$ the tranpose of the vector $residuals$. Using these considerations, compute $s^2$.

Deduce the estimation of the covariance matrix of the parameter estimates, given by $$s^2 \left(M^T M\right)^{-1}$$

Compute the standard errors on the parameter estimates.

Use the built-in ordinary linear least-squares function lm() to verify the results