The Google COVID-19 data repository is a comprehensive open repository of COVID-19 data.
This vignette shows how to use (some of) this data through the
diseasystore
package.
First, it is a good idea to copy the relevant Google COVID-19 data
files locally and store that location as an option for the package.
?DiseasystoreGoogleCovid19
uses only the age-stratified
metrics for COVID-19, so only a subset of the repository is needed to
download.
# First we set the path we want to use as an option
options(
"diseasystore.DiseasystoreGoogleCovid19.source_conn" =
file.path("local", "path")
)
# Ensure folder exists
source_conn <- diseasyoption("source_conn", "DiseasystoreGoogleCovid19")
if (!dir.exists(source_conn)) {
dir.create(source_conn, recursive = TRUE, showWarnings = FALSE)
}
# Define the Google files to download
google_files <- c("by-age.csv", "demographics.csv", "index.csv", "weather.csv")
# Download each file and compress them to reduce storage
purrr::walk(google_files, ~ {
url <- paste0(diseasyoption("remote_conn", "DiseasystoreGoogleCovid19"), .)
destfile <- file.path(
diseasyoption("source_conn", "DiseasystoreGoogleCovid19"),
.
)
if (!file.exists(destfile)) {
download.file(url, destfile)
}
})
The diseasystores
require a database to store its
features in. These should be configured before use and can be stored in
the packages options.
# We define target_conn as a function that opens a DBIconnection to the DB
target_conn <- \() DBI::dbConnect(duckdb::duckdb())
options(
"diseasystore.DiseasystoreGoogleCovid19.target_conn" = target_conn
)
Once the files are downloaded and the target DB is configured, we can
initialize the diseasystore
that uses the Google COVID-19
data.
Once configured such, we can use the feature store directly to get data.
# We can see all the available features in the feature store
ds$available_features
#> [1] "n_population" "age_group" "country_id" "country"
#> [5] "region_id" "region" "subregion_id" "subregion"
#> [9] "n_hospital" "n_deaths" "n_positive" "n_icu"
#> [13] "n_ventilator" "min_temperature" "max_temperature"
# And then retrieve a feature from the feature store
ds$get_feature(feature = "n_hospital",
start_date = as.Date("2020-01-01"),
end_date = as.Date("2020-06-01"))
#> # Source: table<dbplyr_etLUzA01xA> [?? x 5]
#> # Database: DuckDB v1.1.1 [B246705@Windows 10 x64:R 4.4.0/:memory:]
#> key_location key_age_bin n_hospital valid_from valid_until
#> <chr> <chr> <dbl> <date> <date>
#> 1 AR 2 0 2020-01-01 2020-01-02
#> 2 AR 3 0 2020-01-02 2020-01-03
#> 3 AR 9 NA 2020-01-02 2020-01-03
#> 4 AR 1 0 2020-01-04 2020-01-05
#> 5 AR 2 0 2020-01-04 2020-01-05
#> # ℹ more rows