---
title: "Compute cross-tabulation statistics with `stat_cross()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Compute cross-tabulation statistics with `stat_cross()`}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup}
library(ggstats)
library(ggplot2)
```
This statistic is intended to be used with two discrete variables mapped to **x** and **y** aesthetics. It will compute several statistics of a cross-tabulated table using `broom::tidy.test()` and `stats::chisq.test()`. More precisely, the computed variables are:
- **observed**: number of observations in x,y
- **prop**: proportion of total
- **row.prop**: row proportion
- **col.prop**: column proportion
- **expected**: expected count under the null hypothesis
- **resid**: Pearson's residual
- **std.resid**: standardized residual
- **row.observed**: total number of observations within row
- **col.observed**: total number of observations within column
- **total.observed**: total number of observations within the table
- **phi**: phi coefficients, see `augment_chisq_add_phi()`
By default, `stat_cross()` is using `ggplot2::geom_points()`. If you want to plot the number of observations, you need to map `after_stat(observed)` to an aesthetic (here **size**):
```{r}
d <- as.data.frame(Titanic)
ggplot(d) +
aes(x = Class, y = Survived, weight = Freq, size = after_stat(observed)) +
stat_cross() +
scale_size_area(max_size = 20)
```
Note that the **weight** aesthetic is taken into account by `stat_cross()`.
We can go further using a custom shape and filling points with standardized residual to identify visually cells who are over- or underrepresented.
```{r fig.height=6, fig.width=6}
ggplot(d) +
aes(
x = Class, y = Survived, weight = Freq,
size = after_stat(observed), fill = after_stat(std.resid)
) +
stat_cross(shape = 22) +
scale_fill_steps2(breaks = c(-3, -2, 2, 3), show.limits = TRUE) +
scale_size_area(max_size = 20)
```
We can easily recreate a cross-tabulated table.
```{r}
ggplot(d) +
aes(x = Class, y = Survived, weight = Freq) +
geom_tile(fill = "white", colour = "black") +
geom_text(stat = "cross", mapping = aes(label = after_stat(observed))) +
theme_minimal()
```
Even more complicated, we want to produce a table showing column proportions and where cells are filled with standardized residuals. Note that `stat_cross()` could be used with facets. In that case, computation is done separately in each facet.
```{r}
ggplot(d) +
aes(
x = Class, y = Survived, weight = Freq,
label = scales::percent(after_stat(col.prop), accuracy = .1),
fill = after_stat(std.resid)
) +
stat_cross(shape = 22, size = 30) +
geom_text(stat = "cross") +
scale_fill_steps2(breaks = c(-3, -2, 2, 3), show.limits = TRUE) +
facet_grid(rows = vars(Sex)) +
labs(fill = "Standardized residuals") +
theme_minimal()
```