Load the tumor growth data set from the url http://benzekry.perso.math.cnrs.fr/DONNEES/data_exam.csv into a dataframe

In [24]:
df = read.csv('http://benzekry.perso.math.cnrs.fr/DONNEES/data_exam.csv', sep=";")

Load the time vector in a variable time

In [26]:
time = df$Time

Load the volume data in a variable V

In [27]:
V = df$Volume

Linear least-squares

We will first assume a constant error model (i.e. $\sigma_j=\sigma,\, \forall j$) and an exponential structural model: $$ V\left(t; \left(V_0, \alpha \right)\right) = V_0 e^{\alpha t}. $$ We can transform the problem so that it reduces to a linear regression.

$$ \ln(V_j) = \ln\left(V_0\right) + \alpha t_j + \sigma \varepsilon_j $$

Define a variable y as the log of V

In [28]:
y=log(V)

Using the formula seen in class, build the least-squares matrix $M$ for fitting y

Solve the system corresponding to the linear regression

In [55]:
ones = rep(1, length(time))
M    = cbind(ones, time)
theta = solve(t(M)%*%M, t(M)%*%y)
print(theta)
          [,1]
ones 2.8755928
time 0.2317298

Plot the regression line together with the data

In [36]:
time_plot = seq(time[1], tail(time, 1), by=0.1)
y_plot    = theta[1] + theta[2]*time_plot
plot(time, y)
lines(time_plot, y_plot)

Considering that the number of injected cells is $10^6$ cells, which corresponds to $V_0 = 1$ mm$^3$, and looking at the fit, what do you conclude about the validity of the exponential model?

The estimate of $\sigma^2$ is given by $$ s^2 = \frac{1}{n-2}\sum_{j=1}^n\left(y_j - M\hat{\theta}\right)^2 $$ with $\hat{\theta}$ the vector of optimal parameters just found and $n$ is the number of time points.

If $$residuals = y-M\hat{\theta}$$ is the vector of residuals, then $s^2$ can be computed as $$ s^2 = \frac{1}{n-2}residuals^T\cdot residuals $$ with $residuals^T$ the tranpose of the vector $residuals$. Using these considerations, compute $s^2$.

In [49]:
residuals = y - M%*%theta
s2 = 1/(length(time)-2)*t(residuals)%*%residuals # MLE estimator of error model variance
s2 = s2[1,1]

Deduce the estimation of the covariance matrix of the parameter estimates, given by $$ s^2 \left(M^T M\right)^{-1} $$

In [50]:
cov_est = s2*solve(t(M)%*%M)
cov_est
A matrix: 2 × 2 of type dbl
onestime
ones 0.110758897-0.0068369689
time-0.006836969 0.0004557979

Compute the standard errors on the parameter estimates.

In [51]:
se  = sqrt(diag(cov_est))
se
ones
0.332804592397234
time
0.0213494245564541

Use the built-in ordinary linear least-squares function lm() to verify the results

In [54]:
lm_res = lm(y ~ 1 + time)
summary(lm_res)
Call:
lm(formula = y ~ 1 + time)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.58524 -0.15622  0.06471  0.20409  0.31040 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.87559    0.33280    8.64 2.50e-05 ***
time         0.23173    0.02135   10.85 4.59e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2864 on 8 degrees of freedom
Multiple R-squared:  0.9364,	Adjusted R-squared:  0.9285 
F-statistic: 117.8 on 1 and 8 DF,  p-value: 4.589e-06