---
title: "Introduction to the landmaRk package"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to the landmaRk package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```


# Overview

The landmaRk package provides a framework for landmarking analysis of
time-to-event and longitudinal data. It allows users to perform
dynamic risk prediction for time-to-event outcomes whilst taking into account
longitudinal measurements (e.g. biomarkers measured over time). Landmarking
consists of a two-step framework. First, fitting models to the time-varying
covariates, and then using these predictions in survival models.

Given a time-to-event outcome $T_i$, a landmark time $s$ and a time horizon 
$s + w$, the goal of a landmarking analysis is to estimate 
$$
\pi_i(s+w \mid s) = P(T_i > s+w \mid T_i \ge s, 𝒀_i(s)), 
$$
where $𝒀_i(s)$ denotes a vector of covariates which may include time-varying 
covariates.

A landmarking analysis of time-to-event data has two components:

- First, model the longitudinal trajectories of dynamic covariates, $𝒀_i(t)$. 
Then use, the fitted model to make a prediction for $𝒀_i(s)$, $\hat{y}_i(s)$, 
at the landmark time, $s$.

- Second, fit a survival model of the time-to-event outcome, conditioning on the
predicted value for $𝒀_i(s)$, and potentially on additional static and dynamic 
covariates. 

The `landmaRk` package allows users to use the following method for the first 
component:

- Last Observation Carried Forward (LOCF), using the last measurement for $𝒀_i$
recorded prior to $s$ as our prediction, $\hat{y}_i(s)$.

- Linear mixed-effect (LME) model, as implemented in the `lme4` package.

- Latent class mixed model (LCMM), as implemented in the `lcmm` package.

For the second component, at present the `landmaRk` package supports Cox
proportional hazard models as implemented in the `survival`package. 

Additionally, the `landmaRk` package provides a modular system allowing making 
it possible to incorporate additional models both for the longitudinal and the 
survival components


```{r pipeline-diagram, echo = FALSE, out.width = "100%", fig.cap="Diagram of the landmaRk package pipeline"}
knitr::include_graphics(system.file("diagram.svg", package = "landmaRk"))
```


# Setup

In addition to the `landmaRk` package, we will also use `tidyverse`. 
```{r setup}
set.seed(123)
library(landmaRk)
library(lcmm)
library(tidyverse)
library(prodlim)
```

# Example: `aids` data
In this vignette, we use the dataset `aids` to perform landmarking analysis 
of time-to-event data with time-varying covariates. Here is the structure of 
the dataset.

```{r}
library(JMbayes2)
data("aids")
aids$patient <- as.numeric(aids$patient)
str(aids)
```

The dataset contains the following variables:

-   `patient`: a unique patient identifier

-   `Time`: time when death or censoring was recorded

-   `death`: indicates whether death (1) or censoring (0) occurred

-   `obstime`: time when time-varying covariate `CD4` was recorded

-   `CD4`: a time-varying covariate

-   `drug`, `gender`, `prevOI`, `AZT`: static (baseline) covariates

# Initialising the landmarking analysis

First, we split the dataset into two, one containing static 
covariates, event time and indicator of event/censoring, and another one 
containing dynamic covariates. To that end, we use the function `split_wide_df`.
That function returns a named list with the following elements:

- Under the name `df_static`, a dataframe containing static covariates, event 
times and a binary indicator of event/censoring.

- Under the name `df_dynamic`, a named list of dataframes, mapping dynamic 
covariates to dataframes in long format containing longitudinal measurement of 
the relevant dynamic covariate.

The above split reduces data storage requirements, particularly for large 
datasets with a large number of individuals or longitudinal measurements. This
is because static covariate values are stored only once per individual, rather 
than repeatedly for each longitudinal measurement.


```{r}
# DF with Static covariates
aids_dfs <- split_wide_df(
  aids,
  ids = "patient",
  times = "obstime",
  static = c("Time", "death", "drug", "gender", "prevOI"),
  dynamic = c("CD4"),
  measurement_name = "value"
)
static <- aids_dfs$df_static
head(static)
```

```{r}
# DF with Dynamic covariates
dynamic <- aids_dfs$df_dynamic
head(dynamic[["CD4"]])
```


We can now create an object of class `LandmarkAnalysis`, using the helper 
function of the same name.

```{r}
landmarking_object <- LandmarkAnalysis(
  data_static = static,
  data_dynamic = dynamic,
  event_indicator = "death",
  ids = "patient",
  event_time = "Time",
  times = "obstime",
  measurements = "value"
)
```

Arguments to the helper function are the following:

-   `data_static` and `data_dynamic`: two datasets containing static and dynamic
covariates, respectively (as created above using the `split_wide_df` function). 
Both datasets must contain a column with individual ids.

-   `event_indicator`: name of the column that indicates the censoring indicator 
in `data_static`.

-   `measurements`: name of the column in `data_dynamic` that contains the
recorded values of the time-varying covariates (e.g., `"value"`).

-   `times`: name of the column in `data_dynamic` that contains the
measurement times associated with the time-varying covariates (e.g., `"time"`).
-   `ids`: name of the column that identifies patients in both datasets.

-   `event_time`: name of the column that identifies time of event/censoring 
in the static dataset.

# Baseline survival analysis (without time-varying covariates)

First, we perform a survival without time-varying covariates. We can use this 
as a baseline to evaluate the performance of a subsequent landmark analysis 
with such covariates. First step is to establish the landmark times, and to 
work out the risk sets at each of those landmark times.

```{r}
landmarking_object <- landmarking_object |>
  compute_risk_sets(landmarks = c(6, 8))

landmarking_object
```

Now we use the function `fit_survival` to fit the survival model. We specify 
the following arguments:

- `landmarks`: Vector of landmark times at which the model will be fitted.

- `formula`: Two-sided formula object specifying the survival process. Because
we are fitting the baseline model, we include static covariates only.

- `horizons`: Vector of time horizons up to when the model will be fitted. The 
vectors of landmarks and time horizons must be of the same length.
- `method`: Method for the survival component of the landmarking analysis. In 
this case, `"coxph"`.

- `dynamic_covariates`: Vector of names of the dynamic covariates to be used. 
In this baseline analysis, the vector is empty. 

Then, we can make predictions using the function `predict_survival`. Arguments
`method`, `landmarks` and `horizons`are as above. 

```{r}
landmarking_object <- landmarking_object |>
  fit_survival(
    formula = Surv(event_time, event_status) ~ drug,
    landmarks = c(6, 8),
    horizons = 12 + c(6, 8),
    method = "coxph",
    dynamic_covariates = c()
  ) |>
  predict_survival(
    landmarks = c(6, 8),
    horizons = 12 + c(6, 8)
  )
```

To display the results, one can use the method `summary`, specifying 
`type = "survival"`, in addition to a landmark and a horizon. 
```{r}
summary(landmarking_object, type = "survival", landmark = 6, horizon = 18)
```

Now the `performance_metrics` function can be used to calculate 
(for now, in-sample) performance metrics. 
```{r}
performance_metrics(
  landmarking_object,
  landmarks = c(6, 8),
  horizons = c(18, 20),
  auc_t = TRUE, c_index = FALSE,
  h_times = c(3, 6, 12)
)
```

# Landmarking analysis with lme4 + coxph

Now we use the package lme4 to fit a linear mixed model of the time-varying 
covariate, CD4. This first step is followed by fitting a Cox PH sub-model 
using the longitudinal predictions as covariates.

```{r}
landmarking_object <- LandmarkAnalysis(
  data_static = static,
  data_dynamic = dynamic,
  event_indicator = "death",
  ids = "patient",
  event_time = "Time",
  times = "obstime",
  measurements = "value"
)
landmarking_object <- landmarking_object |>
  compute_risk_sets(landmarks = c(6, 8))
```

Provided with the risk sets, now the pipeline has the following four steps:

- Fit the longitudinal model with `fit_longitudinal`,
specifying `method = "lme4"`, and the `formula`
as it would be passed to `lme4::lmer()`. One also needs to specify a vector of
`landmarks` and a vector of dynamic covariates, `dynamic_covariates = c("CD4")`.

- Make predictions with `predict_longitudinal`, specifying `method = "lme4"`.

- Fit the survival submodel with `fit_survival`, like in the baseline model section,
but specifying the vector of dynamic covariates, `dynamic_covariates = c("CD4")`.

- Make predictions with the survival submodel, like in the baseline model section.

```{r}
landmarking_object <- landmarking_object |>
  fit_longitudinal(
    landmarks = c(6, 8),
    method = "lme4",
    formula = value ~ prevOI + obstime + (obstime | patient),
    dynamic_covariates = c("CD4")
  ) |>
  predict_longitudinal(
    landmarks = c(6, 8),
    method = "lme4",
    dynamic_covariates = c("CD4")
  ) |>
  fit_survival(
    formula = Surv(event_time, event_status) ~ drug,
    landmarks = c(6, 8),
    horizons = 12 + c(6, 8),
    method = "coxph",
    dynamic_covariates = c("CD4")
  ) |>
  predict_survival(
    landmarks = c(6, 8),
    horizons = 12 + c(6, 8)
  )
```

As before, one can also use the function `summary` to display the results.

```{r}
summary(landmarking_object,
        type = "longitudinal",
        landmark = 6,
        dynamic_covariate = "CD4")
```

```{r}
summary(landmarking_object, type = "survival", landmark = 6, horizon = 18)
```

Here are the performance metrics: 
```{r}
performance_metrics(
  landmarking_object,
  landmarks = c(6, 8),
  horizons = c(18, 20),
  auc_t = TRUE, c_index = FALSE,
  h_times = c(3, 6, 12)
)
```

# Landmarking analysis with lcmm + coxph

Now we use the
[lcmm package](https://cecileproust-lima.github.io/lcmm/index.html) to fit a
latent class mixed model of the time-varying covariate, CD4. This first step is
followed by fitting a Cox PH sub-model using the longitudinal predictions as
covariates.

The pipeline is identical to that of the lme4+coxph model. However, this time
one has to specify `method = "lcmm"` when calling `fit_longitudinal`, 
in addition to the arguments that will be passed on to `lcmm::hlme()`, namely

- `formula`: a two-sided formula of the fixed effects.

- `mixture`: a one-sided formula of the class-specific fixed effects.

- `random`: a one-sided formula specifying the random effects.

- `subject`: name of the column specifying the subject IDs. 

- `ng`: the number of clusters in the LCMM model.

```{r}
landmarking_object <- LandmarkAnalysis(
  data_static = static,
  data_dynamic = dynamic,
  event_indicator = "death",
  ids = "patient",
  event_time = "Time",
  times = "obstime",
  measurements = "value"
)
```

```{r}
landmarking_object <- landmarking_object |>
  compute_risk_sets(landmarks = c(6, 8)) |>
  fit_longitudinal(
    landmarks = c(6, 8),
    method = "lcmm",
    formula = value ~ obstime + prevOI,
    mixture = ~ obstime + prevOI,
    random = ~ obstime,
    subject = "patient",
    ng = 2,
    dynamic_covariates = c("CD4"),
    maxiter = 5000, rep = 25, nwg = TRUE
  ) |>
  predict_longitudinal(
    landmarks = c(6, 8),
    method = "lcmm",
    avg = TRUE,
    include_clusters = FALSE,
    var.time = "obstime",
    dynamic_covariates = c("CD4")
  ) |>
  fit_survival(
    formula = Surv(event_time, event_status) ~ drug,
    landmarks = c(6, 8),
    horizons = 12 + c(6, 8),
    method = "coxph",
    dynamic_covariates = c("CD4"),
    include_clusters = FALSE
  ) |>
  predict_survival(
    landmarks = c(6, 8),
    horizons = 12 + c(6, 8),
    dynamic_covariates = c("CD4"),
    include_clusters = FALSE
  )
```

```{r}
summary(landmarking_object,
        type = "longitudinal",
        landmark = 6,
        dynamic_covariate = "CD4")
```

```{r}
summary(landmarking_object, type = "survival", landmark = 6, horizon = 18)
```

```{r}
performance_metrics(
  landmarking_object,
  landmarks = c(6, 8),
  horizons = c(18, 20),
  auc_t = TRUE, c_index = FALSE,
  h_times = c(3, 6, 12)
)
```