Getting Started with realestatebr

Introduction

This vignette provides a minimal introduction to the realestatebr package, showing how to use its core functions. Since realestatebr returns tibble as default values, we recommend using it together with the dplyr package, though conversion do data.table is trivial.

library(realestatebr)
library(dplyr)

The code below defines a common theme for all plots in this vignette and is required to fully replicate the code in this document. Despite this, this code is entirely optional and can be omitted.

library(ggplot2)

color_palette <- c(
  "#1E3A5F",
  "#DD6B20",
  "#2C7A7B",
  "#D69E2E",
  "#805AD5",
  "#C53030"
)

theme_series <- function() {
  theme_minimal(
    # swap for other font if needed
    base_family = "Avenir",
    base_size = 10
  ) +
    theme(
      plot.title = element_text(size = 16),
      panel.grid.minor = element_blank(),
      panel.grid.major.x = element_blank(),
      axis.line.x = element_line(color = "gray10", linewidth = 0.5),
      axis.ticks.x = element_line(color = "gray10", linewidth = 0.5),
      axis.title.x = element_blank(),
      legend.position = "bottom",
      palette.color.discrete = color_palette
    )
}

realestatebr provides a unified interface to Brazilian real estate data from multiple public sources. All datasets are returned as tidy tibble objects.

Core Interface

The goal of realestatebr is to provide a unified interface to Brazilian real estate data from multiple public sources. All datasets are returned as tidy tibble objects. The package is centered around a key function: get_dataset(name, table) which retrieves any dataset by name. Without a table argument it returns the default table; use table to select a specific sub-table.

# Default table
abecip <- get_dataset("abecip")

# Specific table
sbpe <- get_dataset("abecip", table = "units")

In order to explore which datasets are available, use list_datasets() and get_dataset_info().

ds <- list_datasets()
name title source available_tables frequency
abecip ABECIP Housing Credit Indicators ABECIP - Associação Brasileira das Entidades de Crédito Imobiliário sbpe, units, cgi monthly
abrainc ABRAINC-FIPE Primary Market Indicators ABRAINC/FIPE indicator, radar, leading quarterly
bcb_realestate BCB Real Estate Market Data Banco Central do Brasil accounting, application, indices, sources, units monthly
bcb_series BCB Economic Series Banco Central do Brasil - SGS core, primary, secondary, tertiary, full varies (daily/monthly/quarterly)
fgv_ibre FGV IBRE Real Estate Indicators FGV IBRE (single table) monthly
rppi Brazilian Residential Property Price Indices Multiple (FIPE/ZAP, IVGR, IGMI, IQA, IQAIW, IVAR, SECOVI-SP) fipezap, ivgr, igmi, iqa, iqaiw, ivar, secovi_sp, sale, rent, all monthly
rppi_bis BIS Residential Property Price Indices Bank for International Settlements selected, detailed_monthly, detailed_quarterly, detailed_annual, detailed_halfyearly quarterly
secovi SECOVI-SP Real Estate Market Data SECOVI-SP - Sindicato da Habitação condo, rent, launch, sale monthly
info <- get_dataset_info("abecip")
names(info$categories)
#> [1] "sbpe"  "units"  "cgi"

The source Argument

The source argument from get_dataset() controls where data comes from. The default ("auto") checks the local cache first, then falls back to the GitHub release. Typically, the best option is to use the default or "github". Choosing "fresh" will download the data from the original source: while this guarantees the most recent data, it is slower.

get_dataset("abecip", source = "cache") # local cache (instant, works offline)
get_dataset("abecip", source = "github") # GitHub release
get_dataset("abecip", source = "fresh") # direct from the original source

Cache files are stored in the user data directory and can be inspected with list_cached_files() or cleared with clear_user_cache().

Example: Housing Credit Cycle

SBPE (Sistema Brasileiro de Poupança e Empréstimo) is the primary funding mechanism for residential mortgages in Brazil. The table sbpe fromabecip` tracks the deposits and withdrawals from saving accounts, that help finance real estate construction and acquisition.

sbpe <- get_dataset("abecip", table = "sbpe")

glimpse(sbpe)
#> Rows: 540
#> Columns: 15
#> $ date              <date> 1982-01-01, 1982-02-01, 1982-03-01, 1982-04-01, 198…
#> $ sbpe_inflow       <dbl> 238234.1, 224080.0, 247218.8, 264925.0, 227636.3, 31…
#> $ sbpe_outflow      <dbl> 261523.1, 161176.0, 118662.8, 378395.0, 137201.3, 15…
#> $ sbpe_netflow      <dbl> -23289, 62904, 128556, -113470, 90435, 164739, -9934…
#> $ sbpe_netflow_pct  <dbl> -0.009387130, 0.021881448, 0.043761242, -0.037006429…
#> $ sbpe_yield        <dbl> 417103, 0, 0, 485995, 0, 0, 642432, 0, 0, 957944, 0,…
#> $ sbpe_stock        <dbl> 2874764, 2937668, 3066224, 3438749, 3529184, 3693923…
#> $ rural_inflow      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ rural_outflow     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ rural_netflow     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ rural_netflow_pct <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ rural_yield       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ rural_stock       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ total_stock       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ total_netflow     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …

The plot below shows the annual net savings flow in recent years.

# Annual net credit flow
sbpe_annual <- sbpe |>
  filter(date >= as.Date("2019-01-01")) |>
  mutate(year = lubridate::year(date)) |>
  summarise(net_flow = sum(sbpe_netflow, na.rm = TRUE) / 1e3, .by = year) |>
  mutate(
    label_num = format(round(net_flow, 1)),
    ypos = if_else(net_flow > 0, net_flow + 10, net_flow - 10)
  )

ggplot(sbpe_annual, aes(year, net_flow)) +
  geom_col(fill = color_palette[1], alpha = 0.9, width = 0.8) +
  geom_text(aes(y = ypos, label = label_num), size = 3) +
  geom_hline(yintercept = 0) +
  scale_x_continuous(breaks = 2019:2026) +
  labs(
    title = "Annual Net Savings Flow (SBPE)",
    x = NULL,
    y = "R$ billions"
  ) +
  theme_series()

The companion table "units" contains monthly counts of financed units.

units <- get_dataset("abecip", table = "units")

glimpse(units)
#> Rows: 291
#> Columns: 7
#> $ date                  <date> 2002-01-01, 2002-02-01, 2002-03-01, 2002-04-01,…
#> $ units_construction    <dbl> 200, 483, 1049, 684, 571, 1109, 216, 506, 1698, …
#> $ units_acquisition     <dbl> 1455, 1456, 1522, 1723, 1536, 1536, 1706, 1838, …
#> $ units_total           <dbl> 1655, 1939, 2571, 2407, 2107, 2645, 1922, 2344, …
#> $ currency_construction <dbl> 13.540470, 32.117295, 62.592800, 44.422429, 23.4…
#> $ currency_acquisition  <dbl> 83.95237, 96.12279, 101.71222, 108.14803, 98.281…
#> $ currency_total        <dbl> 97.49284, 128.24008, 164.30502, 152.57046, 121.7…

The plot shows the amount of units financed per month together with a LOESS trend line.

# SBPE units financed per year
units_recent <- units |>
  filter(date >= as.Date("2019-01-01"))

ggplot(units_recent, aes(date, units_total)) +
  geom_point(alpha = 0.5, size = 0.8, color = color_palette[1]) +
  geom_smooth(
    color = color_palette[1],
    lwd = 0.7,
    se = FALSE,
    method = stats::loess,
    method.args = list(span = 0.4)
  ) +
  scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
  labs(
    title = "Monthly Financed Units",
    y = "Units"
  ) +
  theme_series()

Example: Real Estate Credit Portfolio

The bcb_realestate dataset imports all real estate statistics from the Brazilian Central Bank. This is a relatively large dataset and exploring can be cumbersome. Each series is uniquely identified by date and series_info. Helper functions v1, v2, …, v5, abbrev_state, category, and type are provided to simplify the use of the dataset.

The code below shows how to access a specific series and also how to fetch a group of related series.

bcb <- get_dataset("bcb_realestate")

# Get a specific series
sfh_pf <- bcb |>
  filter(series_info == "credito_estoque_carteira_credito_pf_sfh_br")

# Get the all the related series for 'estoque_carteira_credito_pf'
credit_stock <- bcb |>
  filter(
    category == "credito",
    type == "estoque",
    v1 == "carteira",
    v2 == "credito",
    v3 == "pf",
    # since v4 is left blank, we get all credit lines
    v5 == "br"
  )

# The helper columns essentially separate the 'series_info' column allowing
# for easier filtering. It's equivalent to filtering by regex
credit_stock <- bcb |>
  filter(grepl(
    "(?<=credito_estoque_carteira_credito_pf_).+_br$",
    series_info,
    perl = TRUE
  ))

The single series shows only the values from SFH (specific credit line).

ggplot(sfh_pf, aes(date, value / 1e9)) +
  geom_line(lwd = 0.7, color = color_palette[1]) +
  labs(title = "SFH", y = "R$ (billions)") +
  theme_series()

The grouped series show the entire household credit stock by credit line.

credit_labels <- c(
  "Home Equity" = "home-equity",
  "Comercial" = "comercial",
  "Livre" = "livre",
  "FGTS" = "fgts",
  "SFH" = "sfh"
)

credit_stock <- credit_stock |>
  mutate(
    credit_line_label = factor(
      v4,
      levels = credit_labels,
      labels = names(credit_labels)
    )
  )

ggplot(credit_stock, aes(date, value / 1e9)) +
  geom_area(aes(fill = credit_line_label), alpha = 0.9) +
  scale_fill_manual(values = rev(color_palette[1:5])) +
  scale_x_date(expand = expansion(mult = c(0.01))) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.05))) +
  labs(
    title = "Real Estate Credit Stock",
    subtitle = "Household real estate credit stock (total debt) by credit line",
    y = "R$ (billions)",
    fill = NULL
  ) +
  theme_series()

As a final warning, note that the bcb_realestate dataset follows the YYYY-MM-DD format using the last day of the month as default value (e.g. 2023-01-31). This can cause issues when merging with other datasets, since the first day of the month is the more common date format (e.g. 2023-01-01).

To avoid this, use lubridate::floor_date(date, 'month'). Future versions of realestatebr might provide this as a default behavior.

Reference (all datasets)

The available datasets are listed below.

Dataset Source Tables Status
abecip ABECIP sbpe, units, cgi Active
abrainc ABRAINC / FIPE indicator, radar, leading Active
bcb_realestate Banco Central do Brasil accounting, application, indices, sources, units Active
bcb_series Banco Central do Brasil core, primary, secondary, tertiary, full Active
fgv_ibre FGV IBRE Active
rppi FIPE/ZAP, IVGR, IGMI, IQA, IVAR, SECOVI-SP sale, rent, fipezap, ivgr, igmi, iqa, iqaiw, ivar, secovi_sp Active
rppi_bis Bank for International Settlements selected, detailed_monthly, detailed_quarterly, detailed_annual, detailed_halfyearly Active
secovi SECOVI-SP condo, rent, launch, sale Active