In this example we’re going to be using the Eunomia synthetic data.
library(CDMConnector)
library(CohortConstructor)
library(CodelistGenerator)
library(PhenotypeR)
library(dplyr)
library(ggplot2)
con <- DBI::dbConnect(duckdb::duckdb(),
CDMConnector::eunomiaDir("synpuf-1k", "5.3"))
cdm <- CDMConnector::cdmFromCon(con = con,
cdmName = "Eunomia Synpuf",
cdmSchema = "main",
writeSchema = "main",
achillesSchema = "main")
We have created our study cohort, but to inform analytic decisions
and interpretation of results requires an understanding of the dataset
from which it has been derived. The databaseDiagnostics()
function will help us better understand a data source.
To run database diagnostics we just need to provide our cdm reference to the function.
db_diagnostics <- databaseDiagnostics(cdm)
db_diagnostics |> glimpse()
#> Rows: 6,224
#> Columns: 13
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,…
#> $ cdm_name <chr> "Eunomia Synpuf", "Eunomia Synpuf", "Eunomia Synpuf",…
#> $ group_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ group_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name <chr> "general", "general", "observation_period", "cdm", "g…
#> $ variable_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ estimate_name <chr> "snapshot_date", "person_count", "count", "source_nam…
#> $ estimate_type <chr> "date", "integer", "integer", "character", "character…
#> $ estimate_value <chr> "2025-02-05", "1000", "1048", "Synpuf", "v5.0 06-AUG-…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
From our results we can create a table with a summary of metadata for the data source.
Estimate |
Database name
|
---|---|
Eunomia Synpuf | |
General | |
Snapshot date | 2025-02-05 |
Person count | 1,000 |
Vocabulary version | v5.0 06-AUG-21 |
Observation period | |
N | 1,048 |
Start date | 2008-01-01 |
End date | 2010-12-31 |
Cdm | |
Source name | Synpuf |
Version | v5.3.1 |
Holder name | ohdsi |
Release date | 2018-03-15 |
Description | |
Documentation reference | |
Source type | duckdb |
In addition, we also can see a summary of individuals’ observation periods. From this we can see if there are individuals with multiple, non-overlapping, observation periods and how long each observation period lasts on average.
Observation period ordinal | Variable name | Estimate name |
CDM name
|
---|---|---|---|
Eunomia Synpuf | |||
all | Number records | N | 1,048 |
Number subjects | N | 1,000 | |
Records per person | mean (sd) | 1.05 (0.21) | |
median [Q25 - Q75] | 1 [1 - 1] | ||
Duration in days | mean (sd) | 979.71 (262.79) | |
median [Q25 - Q75] | 1,096 [1,096 - 1,096] | ||
Days to next observation period | mean (sd) | 172.17 (108.35) | |
median [Q25 - Q75] | 138 [93 - 254] | ||
1st | Number subjects | N | 1,000 |
Duration in days | mean (sd) | 994.16 (257.95) | |
median [Q25 - Q75] | 1,096 [1,096 - 1,096] | ||
Days to next observation period | mean (sd) | 172.17 (108.35) | |
median [Q25 - Q75] | 138 [93 - 254] | ||
2nd | Number subjects | N | 48 |
Duration in days | mean (sd) | 678.60 (164.50) | |
median [Q25 - Q75] | 730 [730 - 730] | ||
Days to next observation period | mean (sd) | - | |
median [Q25 - Q75] | - |