In this vignette, we are going to present how to run
PhenotypeDiagnostics().
We’ll use the following packages and mock data for example purposes:
library(CohortConstructor)
library(OmopSketch)
library(PhenotypeR)
library(dplyr)
library(DBI)
library(duckdb)
library(CDMConnector)
con <- dbConnect(duckdb(),
eunomiaDir("synpuf-1k", "5.3"))
cdm <- cdmFromCon(con = con,
cdmName = "Eunomia Synpuf",
cdmSchema = "main",
writeSchema = "main",
achillesSchema = "main")
cdmNote that we have included achilles tables in our cdm reference, which will be used to speed up some of the analyses.
We need to create a set of cohorts to review. For this we are going to use the package CohortConstructor to generate cohorts with users of warfarin, acetaminophen and morphine.
# Create codelists
codes <- list("warfarin" = c(1310149, 40163554),
"acetaminophen" = c(1125315, 1127078, 1127433, 40229134, 40231925, 40162522, 19133768),
"morphine" = c(1110410, 35605858, 40169988))
# Instantiate cohorts with CohortConstructor
cdm$my_cohort <- conceptCohort(cdm = cdm,
conceptSet = codes,
exit = "event_end_date",
overlap = "merge",
name = "my_cohort")Now that we have our cohort, we will use
phenotypeDiagnotics() to assess them. This will run the
following diagnostics which help us know whether our cohorts are ready
to be used in research with the OMOP CDM dataset we’re using:
databaseDiagnotics().codelistDiagnotics().cohortDiagnotics().populationDiagnotics().If we do not provide any specifications, the default values of the functions will be used. That means, the following script will run with the default values used in each individual diagnostics function.
diagnostics <- phenotypeDiagnostics(cdm$my_cohort,
databaseDiagnostics = list(),
codelistDiagnostics = list(),
cohortDiagnostics = list(),
populationDiagnostics = list(),
stagingDirectory = NULL)Notice that we can specify the directory where to save a log file so we can keep track on which incremental results are being run at each time.
If we don’t want to run one of the diagnostics we can switch it off by setting it to NULL.
phenotypeDiagnostics(cdm$my_cohort,
databaseDiagnostics = list(),
codelistDiagnostics = NULL,
cohortDiagnostics = list(),
populationDiagnostics = NULL)Or if we want to change the settings we can include arguments used in
the sub-functions in a list. For example, survial analysis is not run by
default (cohortSuvival is set by default to FALSE in
cohortDiagnotics()). We can run this, leaving other
arguments as their defaults, like so:
diagnostics <- phenotypeDiagnostics(cdm$my_cohort,
databaseDiagnostics = list(),
codelistDiagnostics = list(),
cohortDiagnostics = list("cohortSurvival" = TRUE),
populationDiagnostics = list())Although we may have created our study cohort, to inform analytic decisions and interpretation of results requires an understanding of the dataset from which it has been derived. The database diagnostics builds on OmopSketch package to perform the following analyses:
Codelist diagnostics builds on CodelistGenerator and MeasurementDiagnostics R packages to perform the following analyses:
Cohort diagnostics builds on CohortCharacteristics and CohortSurvival R packages to perform the following analyses on our cohorts:
For computational efficiency, cohort diagnostics will take a joint
random sample of 20,000 people from across the study cohorts for
describing cohort charateristics. The number sampled can be changed by
altering the cohortSample argument
(e.g. cohortSample = 40000 to double the number). Sampling
can be switched off by setting cohortSample = NULL.
For each of the input cohorts, cohort diagnostics are also run on a
set of age and sex matched controls taken from the dataset as a whole.
Again random sampling is used for efficiency. By default 1,000 age and
sex matched controls are identified for 1,000 individuals from each of
the study cohorts. The number matched can be changed by altering the
matchedSample argument
(e.g. matchedSample = 2000 to double the number). Sampling
can be switched off by setting matchedSample = NULL.
Creation of age and sex matched controls can be skipped by setting
matchedSample = 0.
Population diagnostics builds on IncidencePrevalence R package to perform the following analyses:
By default, these analyses are performed for:
By default incidence rates and period prevalence will be calculated
for all years captured in the dataset (based on earliest observation
period start date and latest observation period end date). The date
range can though be limited by using the
populationDateRange argument.
These analyses are also conducted on a random sample of the
population captured in the dataset. By default this sample is set to
100,000 individuals and so will only be relevant for particularly large
datasets. The sampling number can be changed via the
populationSample argument
(e.g. populationSample = 200000 to double the number) or
switched off by setting populationSample = NULL.
To save our diagnositics results, we can use exportSummarisedResult function from omopgenerics R Package:
Once we get our Phenotype diagnostics result, we can
use shinyDiagnostics to easily create a shiny app and
visualise our results:
Notice that we have specified the minimum number of counts
(minCellCount) for suppression to be shown in the shiny
app, and also that we want the shiny to be launched in a new R session
(open). You can see the shiny app generated for this
example in here.See
Shiny
diagnostics vignette for a full explanation of the shiny app.