Transfer REDCap data to DuckDB with minimal memory overhead, designed for large datasets that exceed available RAM.
From CRAN:
install.packages("redquack")
Development version:
# install.packages("pak")
::pak("dylanpieper/redquack") pak
Data from REDCap is transferred to DuckDB in configurable chunks of record IDs:
library(redquack)
<- redcap_to_duckdb(
con redcap_uri = "https://redcap.example.org/api/",
token = "YOUR_API_TOKEN",
record_id_name = "record_id",
chunk_size = 1000
# Increase chunk size for memory-efficient systems (faster)
# Decrease chunk size for memory-constrained systems (slower)
)
Query the data with dplyr
:
library(dplyr)
<- tbl(con, "data") |>
demographics filter(demographics_complete == 2) |>
select(record_id, age, race, gender) |>
collect()
<- tbl(con, "data") |>
age_summary group_by(gender) |>
summarize(
n = n(),
mean_age = mean(age, na.rm = TRUE),
median_age = median(age, na.rm = TRUE)
|>
) collect()
Create a Parquet file directly from DuckDB (efficient for sharing data):
::dbExecute(con, "COPY (SELECT * FROM data) TO 'redcap.parquet' (FORMAT PARQUET)") DBI
Remember to close the connection when finished:
::dbDisconnect(con, shutdown = TRUE) DBI
The DuckDB database created by redcap_to_duckdb()
contains two tables:
data
: Contains all exported REDCap records with
optimized column types
::dbGetQuery(con, "SELECT * FROM data LIMIT 10") DBI
log
: Contains timestamped logs of the transfer
process for troubleshooting
::dbGetQuery(con, "SELECT timestamp, type, message FROM log ORDER BY timestamp") DBI