| Title: | Access 'Wharton Research Data Services' ('WRDS') |
| Version: | 0.0.1 |
| Description: | Provides simple functions for accessing data from 'Wharton Research Data Services' ('WRDS'), a widely used financial database in academic research. Includes credential management via the system keyring, database tools, and functions for downloading generic tables, 'Compustat' fundamentals, and linking tables. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| URL: | https://github.com/statzhero/wrds |
| BugReports: | https://github.com/statzhero/wrds/issues |
| Suggests: | testthat (≥ 3.0.0), withr |
| Config/testthat/edition: | 3 |
| Imports: | cli, DBI, dbplyr, dplyr, keyring, rlang, RPostgres, tidylog |
| NeedsCompilation: | no |
| Packaged: | 2026-01-14 02:23:14 UTC; rico |
| Author: | Ulrich Atz |
| Maintainer: | Ulrich Atz <ulrich.atz@unibocconi.it> |
| Depends: | R (≥ 4.1.0) |
| Repository: | CRAN |
| Date/Publication: | 2026-01-19 18:20:02 UTC |
wrds: Access 'Wharton Research Data Services' ('WRDS')
Description
Provides simple functions for accessing data from 'Wharton Research Data Services' ('WRDS'), a widely used financial database in academic research. Includes credential management via the system keyring, database tools, and functions for downloading generic tables, 'Compustat' fundamentals, and linking tables.
Author(s)
Maintainer: Ulrich Atz ulrich.atz@unibocconi.it (ORCID) [copyright holder]
See Also
Useful links:
Check connection validity
Description
Internal function to validate a database connection object.
Usage
check_connection(wrds)
Arguments
wrds |
Object to check. |
Value
Invisibly returns TRUE if valid; aborts otherwise.
Get company table configuration
Description
Internal function returning table names and default columns for company queries.
Usage
company_config(region)
Arguments
region |
One of |
Value
A list with table and columns.
Get Compustat configuration
Description
Internal function returning table names and filters for Compustat queries.
Usage
compustat_config(frequency, region)
Arguments
frequency |
One of |
region |
One of |
Value
A list with table, datafmt, popsrc, and columns.
Describe a table
Description
Displays a glimpse-like summary of a WRDS table showing column names
and types, similar to dplyr::glimpse().
Usage
describe_table(wrds, library, table, n = 20, max_cols = 25)
Arguments
wrds |
A |
library |
Character. The name of the library (schema). |
table |
Character. The name of the table. |
n |
Integer. Number of sample rows to fetch for value preview. Default is 20. |
max_cols |
Integer. Maximum number of columns to display. Default is 25. |
Value
Invisibly returns a list with components:
- columns
A data frame with
column_nameanddata_type- nrow
Row count
- sample
A data frame with sample rows (if
n > 0)
Examples
## Not run:
wrds <- wrds_connect()
describe_table(wrds, "comp", "funda")
wrds_disconnect(wrds)
## End(Not run)
Fill missing SIC codes from company header
Description
Internal function that joins fundamentals data with company header to fill missing historical SIC codes using coalesce.
Usage
fill_sic_codes(tbl, wrds)
Arguments
tbl |
A lazy table from funda/fundq. |
wrds |
Database connection. |
Value
A collected tibble with sic column (character, coalesced from
sich and header sic).
Download Compustat company header data
Description
Downloads company-level static data from Compustat including header SIC codes, NAICS codes, state of incorporation, and other identifying information.
Usage
get_company(
wrds,
region = c("na", "global"),
columns = NULL,
n = Inf,
lazy = FALSE
)
Arguments
wrds |
A |
region |
One of |
columns |
Character vector of columns to return. Defaults to key identifiers and classification codes. |
n |
Maximum number of rows to return. Defaults to |
lazy |
If |
Details
The sic column contains the "header" SIC code, which is the company's
most recent SIC classification stored as a character. For historical SIC
codes that change over time, use get_compustat() with fill_sic = TRUE,
which coalesces the historical sich (integer) with the header sic.
Value
A tibble with company header data. Default columns vary by region:
North America (from comp.company):
-
gvkey,conm: Identifiers -
sic,naics: Industry classifications (character) -
state,fic,loc: Geographic information
Global (from comp.g_company):
-
gvkey,conm: Identifiers -
sic,naics: Industry classifications -
fic,loc: Geographic information
See Also
get_compustat() for fundamentals data with optional SIC filling
Examples
## Not run:
wrds <- wrds_connect()
# Get company header data
company <- get_company(wrds)
# Get global companies
g_company <- get_company(wrds, region = "global")
# Lazy query
get_company(wrds, lazy = TRUE) |>
dplyr::filter(sic == "7370") |>
dplyr::collect()
wrds_disconnect(wrds)
## End(Not run)
Download Compustat fundamentals
Description
Downloads financial statement data from Compustat with standard filters for clean, analysis-ready data.
Usage
get_compustat(
wrds,
frequency = c("annual", "quarterly"),
region = c("na", "global"),
start_date = NULL,
end_date = NULL,
columns = NULL,
add_columns = NULL,
indfmt = "INDL",
consol = "C",
fill_sic = FALSE,
n = Inf,
lazy = FALSE
)
Arguments
wrds |
A |
frequency |
One of |
region |
One of |
start_date |
Start date for filtering. Character string in
|
end_date |
End date for filtering. Character string in
|
columns |
Character vector of columns to return, replacing the defaults.
Use |
add_columns |
Character vector of additional columns to include beyond
the defaults. Ignored if |
indfmt |
Industry format filter. Defaults to |
consol |
Consolidation level. Defaults to |
fill_sic |
If |
n |
Maximum number of rows to return. Defaults to |
lazy |
If |
Details
Default filters follow standard practice for most research applications.
Region-specific filters are applied automatically based on region:
-
datafmt:"STD"for North America,"HIST_STD"for Global -
popsrc:"D"(domestic) for North America,"I"(international) for Global
North America and Global data have different structures and should not be combined without careful column harmonization.
Value
A tibble with Compustat fundamentals. Default columns vary by region:
North America (from comp.funda / comp.fundq):
Identifiers:
gvkey,cusip,tic,conm,datadateTime:
fyear/fyearq,fyr/fqtrIncome:
ni/niq,ib/ibq,oiadp/oiadpq,revt/revtqBalance sheet:
at/atq,lt/ltq,seq/seqq,ceq/ceqqMarket:
csho/cshoq,prcc_f/prccqOther:
sale/saleq,capx/capxy,che/cheq,dlc/dlcq,dltt/dlttqIndustry:
sich(historical SIC);sic(whenfill_sic = TRUE, coalesced fromsichand header SIC)
Global (from comp.g_funda / comp.g_fundq):
Identifiers:
gvkey,isin,conm,datadateGeography:
loc,fic,exchgSimilar financial variables (with some differences, e.g.,
nit/nitqinstead ofni/niq)
See Also
link_ccm() for CRSP-Compustat linking, get_company() for
company header data
Examples
## Not run:
wrds <- wrds_connect()
# Annual North America fundamentals
funda <- get_compustat(wrds)
# Quarterly with date filter
fundq <- get_compustat(wrds,
frequency = "quarterly",
start_date = "2020-01-01",
end_date = "2023-12-31"
)
# Global annual
g_funda <- get_compustat(wrds, region = "global")
# Lazy query for further filtering
get_compustat(wrds, lazy = TRUE) |>
dplyr::filter(fyear >= 2020) |>
dplyr::select(gvkey, datadate, at, lt) |>
dplyr::collect()
# Fill missing SIC codes with header SIC from comp.company
funda_sic <- get_compustat(wrds, fill_sic = TRUE)
# Preview first 100 rows before full download
preview <- get_compustat(wrds, n = 100)
wrds_disconnect(wrds)
## End(Not run)
Download data from any WRDS table
Description
Generic function to download data from any table in the WRDS database. Returns a lazy table by default, allowing you to build queries with dplyr before collecting.
Usage
get_table(wrds, library, table, columns = NULL, n = Inf, lazy = TRUE)
Arguments
wrds |
A |
library |
Character. The name of the library (schema), e.g., |
table |
Character. The name of the table within the library. |
columns |
Character vector of columns to return. If |
n |
Maximum number of rows to return. Defaults to |
lazy |
If |
Details
This function provides generic access to any WRDS table. For commonly-used tables with standard research filters, prefer the specialized functions:
-
get_compustat()for Compustat fundamentals with standard filters -
get_company()for company header data -
link_ccm()for CRSP-Compustat linking
The lazy table can be filtered, selected, and mutated using dplyr verbs, which are translated to SQL and executed on the server:
get_table(wrds, "crsp", "msf") |> filter(date >= "2025-01-01") |> select(permno, date, ret, prc) |> collect()
Value
A tbl_lazy object (if lazy = TRUE) or a tibble (if lazy = FALSE).
See Also
describe_table() to explore table structure,
list_tables() to list available tables in a library
Examples
## Not run:
wrds <- wrds_connect()
# Preview table structure first
describe_table(wrds, "crsp", "msf")
# Get a lazy table and build your query
get_table(wrds, "crsp", "msf") |>
dplyr::filter(date >= "2025-01-01") |>
dplyr::select(permno, date, ret, prc, vol) |>
dplyr::collect()
# Collect immediately with specific columns
get_table(wrds, "crsp", "dsf",
columns = c("permno", "date", "ret", "prc"),
lazy = FALSE,
n = 1000
)
# Access any table in any library
get_table(wrds, "ibes", "statsum_epsus") |>
dplyr::filter(fpedats >= "2025-01-01") |>
dplyr::collect()
wrds_disconnect(wrds)
## End(Not run)
Get CRSP-Compustat linking table
Description
Downloads the CCM (CRSP-Compustat Merged) linking table that maps CRSP PERMNOs to Compustat GVKEYs with valid date ranges.
Usage
link_ccm(
wrds,
linktype = c("LC", "LU", "LS"),
linkprim = c("P", "C"),
n = Inf,
lazy = FALSE
)
Arguments
wrds |
A |
linktype |
Character vector. Types of links to include.
Defaults to
|
linkprim |
Character vector. Link primacy filters.
Defaults to
|
n |
Maximum number of rows to return. Defaults to |
lazy |
If |
Details
The linking table comes from crsp.ccmxpf_lnkhist. Missing linkenddt
values indicate ongoing links and are replaced with the maximum date in the
table for easier date-range joins.
To use the link, join on gvkey and ensure your observation date falls
within the linkdt to linkenddt range.
Value
A tibble with columns:
- gvkey
Compustat company identifier
- permno
CRSP permanent security identifier
- linkdt
Start date of the link
- linkenddt
End date of the link (missing values replaced with max date)
- linktype
Type of link
- linkprim
Link primacy
References
Ian Gow, Financial Accounting Research, Chapter on Identifiers: https://iangow.github.io/far_book/identifiers.html
See Also
Examples
## Not run:
wrds <- wrds_connect()
ccm <- link_ccm(wrds)
# Join with Compustat data
compustat <- get_compustat(wrds)
compustat |>
dplyr::inner_join(ccm, by = dplyr::join_by(gvkey)) |>
dplyr::filter(datadate >= linkdt, datadate <= linkenddt)
wrds_disconnect(wrds)
## End(Not run)
Get IBES-CRSP linking table
Description
Downloads the WRDS-provided linking table that maps IBES tickers to CRSP PERMNOs with valid date ranges and match quality scores.
Usage
link_ibes_crsp(wrds, max_score = 5L, n = Inf, lazy = FALSE)
Arguments
wrds |
A |
max_score |
Maximum match quality score to include. Defaults to better matches:
|
n |
Maximum number of rows to return. Defaults to |
lazy |
If |
Details
The linking table comes from wrdsapps_link_crsp_ibes.ibcrsphist.
To use the link, join on ticker and ensure your observation date falls
within the sdate to edate range.
Value
A tibble with columns:
- ticker
IBES ticker
- permno
CRSP permanent security identifier
- sdate
Start date of the link
- edate
End date of the link
- score
Match quality score (1 = best, 6 = worst)
References
WRDS IBES-CRSP Linking Table Documentation: https://wrds-www.wharton.upenn.edu/documents/796/IBES_CRSP_Linking_Table_by_WRDS.pdf
See Also
Examples
## Not run:
wrds <- wrds_connect()
ibes_link <- link_ibes_crsp(wrds)
# Join with IBES data on ticker and date range
ibes_data |>
dplyr::inner_join(ibes_link, by = dplyr::join_by(ticker)) |>
dplyr::filter(date >= sdate, date <= edate)
wrds_disconnect(wrds)
## End(Not run)
List available libraries
Description
Returns a character vector of available schema names (libraries) on WRDS.
Usage
list_libraries(wrds)
Arguments
wrds |
A |
Value
A character vector of library names.
Examples
## Not run:
wrds <- wrds_connect()
list_libraries(wrds)
wrds_disconnect(wrds)
## End(Not run)
List tables in a library
Description
Returns a character vector of table names within a WRDS library (schema).
Usage
list_tables(wrds, library)
Arguments
wrds |
A |
library |
Character. The name of the library (schema) to query. |
Value
A character vector of table names.
Examples
## Not run:
wrds <- wrds_connect()
list_tables(wrds, "comp")
wrds_disconnect(wrds)
## End(Not run)
Convert SIC codes to 2-digit industry codes
Description
Extracts the first two characters from SIC codes to create broader industry classifications.
Usage
sic_2digit(sic)
Arguments
sic |
A numeric or character vector of SIC codes. |
Details
SIC codes are hierarchical: the first two digits represent major industry groups (e.g., "54" = Retail-Food Stores), while the full 4-digit code provides more specific classifications (e.g., "5412" = Retail-Convenience Stores).
Value
A character vector of 2-digit SIC codes.
Examples
# Convenience Stores (SIC 5412) -> Retail-Food Stores (54)
sic_2digit(5412)
# [1] "54"
sic_2digit(c(5412, 5400))
# [1] "54" "54"
Smart collect with size awareness
Description
Internal function that collects a lazy table with warnings for large queries.
Usage
smart_collect(tbl, wrds, lazy = FALSE)
Arguments
tbl |
A |
wrds |
Database connection for row counting. |
lazy |
If |
Value
A tibble if collecting, or the lazy table if lazy = TRUE.
Connect to WRDS
Description
Establishes a connection to the WRDS PostgreSQL server using credentials stored securely in the system keyring.
Usage
wrds_connect(user_key = "wrds_user", password_key = "wrds_pw", keyring = NULL)
Arguments
user_key |
Name of the keyring entry storing the WRDS username.
Defaults to |
password_key |
Name of the keyring entry storing the WRDS password.
Defaults to |
keyring |
Optional keyring name. If |
Details
Credentials must be set up before first use with wrds_set_credentials().
The connection uses bigint = "integer" for compatibility with R's
integer type.
Value
A DBIConnection object for the WRDS PostgreSQL database.
See Also
wrds_disconnect(), wrds_set_credentials()
Examples
## Not run:
wrds <- wrds_connect()
list_libraries(wrds)
wrds_disconnect(wrds)
## End(Not run)
Disconnect from WRDS
Description
Closes a WRDS database connection.
Usage
wrds_disconnect(wrds)
Arguments
wrds |
A |
Value
Invisibly returns TRUE if disconnection was successful.
Examples
## Not run:
wrds <- wrds_connect()
wrds_disconnect(wrds)
## End(Not run)
Set WRDS credentials
Description
Interactively stores WRDS username and password in the system keyring for secure, persistent storage.
Usage
wrds_set_credentials(
user_key = "wrds_user",
password_key = "wrds_pw",
keyring = NULL
)
Arguments
user_key |
Name for the username keyring entry. Defaults to |
password_key |
Name for the password keyring entry. Defaults to |
keyring |
Optional keyring name. If |
Details
This function prompts for username and password interactively. Credentials are stored securely using the operating system's keyring (Keychain on macOS, Credential Manager on Windows, Secret Service on Linux).
Value
Invisibly returns TRUE on success.
Examples
## Not run:
wrds_set_credentials()
## End(Not run)