tidier
package provides ‘Apache Spark’ style window
aggregation for R dataframes and remote dbplyr
tbls via ‘mutate’ in
‘dplyr’
flavour.
Create a new column with average temp over last seven days in the same month.
set.seed(101)
|>
airquality # create date column
::mutate(date_col = lubridate::make_date(1973, Month, Day)) |>
dplyr# create gaps by removing some days
::slice_sample(prop = 0.8) |>
dplyr# compute mean temperature over last seven days in the same month
::mutate(avg_temp_over_last_week = mean(Temp, na.rm = TRUE),
tidier.order_by = Day,
.by = Month,
.frame = c(lubridate::days(7), # 7 days before current row
::days(-1) # do not include current row
lubridate
),.index = date_col
)#> # A tibble: 122 × 8
#> Month Ozone Solar.R Wind Temp Day date_col avg_temp_over_last_week
#> <int> <int> <int> <dbl> <int> <int> <date> <dbl>
#> 1 6 NA 286 8.6 78 1 1973-06-01 NaN
#> 2 6 NA 242 16.1 67 3 1973-06-03 78
#> 3 6 NA 186 9.2 84 4 1973-06-04 72.5
#> 4 6 NA 264 14.3 79 6 1973-06-06 76.3
#> 5 6 29 127 9.7 82 7 1973-06-07 77
#> 6 6 NA 273 6.9 87 8 1973-06-08 78
#> 7 6 NA 259 10.9 93 11 1973-06-11 83
#> 8 6 NA 250 9.2 92 12 1973-06-12 85.2
#> 9 6 23 148 8 82 13 1973-06-13 86.6
#> 10 6 NA 332 13.8 80 14 1973-06-14 87.2
#> # ℹ 112 more rows
mutate
supports
.by
(group by),.order_by
(order by),.frame
(endpoints of window frame),.index
(identify index column like date column, in df
version only),.complete
(whether to compute over incomplete window,
in df version only).mutate
automatically uses a future backend (via furrr
, in df
version only).This implementation is inspired by Apache Spark’s windowSpec
class with rangeBetween
and rowsBetween
.
dbplyr
implements this via dbplyr::win_over
enabling sparklyr
users to write window computations. Also see, dbplyr::window_order
/dbplyr::window_frame
.
tidier
’s mutate
wraps this functionality via
uniform syntax across dataframes and remote tbls.
tidypyspark
python package implements mutate
style window computation
API for pyspark.
remotes::install_github("talegari/tidier")
install.packages("tidier")
tidier
package is deeply indebted to three amazing
packages and people behind it.
Wickham H, François R, Henry L, Müller K, Vaughan D (2023). _dplyr: A
Grammar of Data Manipulation_. R package version 1.1.0,
<https://CRAN.R-project.org/package=dplyr>.
Vaughan D (2021). _slider: Sliding Window Functions_. R package
version 0.2.2, <https://CRAN.R-project.org/package=slider>.
Wickham H, Girlich M, Ruiz E (2023). _dbplyr: A 'dplyr' Back End
for Databases_. R package version 2.3.2,
<https://CRAN.R-project.org/package=dbplyr>.