calculate_standings() has been deprecated.
Please use nflseedR::nfl_standings() in nflseedR v2.0
instead. (#510)|> operator. This follows the Tidyverse
R version support rules. (#511)calculate_stats() incorrectly counted
receiving_air_yards. (#500)vegas_wp variables were broken when
spread_line data was missing. (#503)calculate_stats() incorrectly
calculated target_share and air_yards_share
when summary_level = "season". (#505)calculate_stats() incorrectly counted
fumbles. (#514)Thank you to @ak47twq, @isaactpetersen, @jacobakaye, @johnpholden, @marvin3FF, @mrcaseb, and @tanho63 for their questions, feedback, and contributions towards this release.
calculate_stats() that combines the
output of all calculate_player_stats*() functions with a
more robust and faster approach. The
calculate_player_stats*() function will be deprecated in a
future release. (#470)nfl_stats_variables. It
lists and explains all variables returned by
calculate_stats(). A searchable table is available at https://www.nflfastr.com/articles/stats_variables.html.
(#470){crayon}, {DT}, {httr},
{jsonlite}, {qs} dependencies. (#453)calculate_player_stats_def now returns
season_type if argument weekly is set to
TRUE for consistency with the other player stats functions.
(#455)missing_raw_pbp() now allows filtering by
season. (#457)decode_player_ids(). (#458)yrdln variable
didn’t equal "MID 50" at midfield. (#459)drive_start_yard_line missed the
blank space between team name and yard line number. (#459)goal_to_go variable was
FALSE in actual goal to go situations. (#460)fixed_drive and
fixed_drive_result where the second weather delay in
2023_13_ARI_PIT wasn’t identified correctly. (#461)punter_player_id, and punter_player_name
are filled for blocked punt attempts. (#463)stat_ids with some IDs that were
previously missing. (#470)clean_pbp() returned
pass = 1 in actual rush plays in very rare cases.
(#479)fixed_drive (#482)penalty_type now correctly lists the
penalty “Kickoff Short of Landing Zone” introduced in the 2024 season.
(#486)ep was incorrect on PAT attempts
preceded by a timeout and then a penalty (extremely rare). This bug also
caused the variables total_home_epa and
total_away_epa to be incorrect for all subsequent plays in
the same game. (#493)Thank you to @ahmed-cheema, @andrewtek, @guga31bb, @isaactpetersen, @JoeMarino2021, @john-b-edwards, @marcusSasser, @mlounsberry, @morganandrew, @mrcaseb, @mscoop16, @parsnipz, @rjthompson2, and @Useight for their questions, feedback, and contributions towards this release.
calculate_series_conversion_rates() now
correctly aggregates season level conversion rates. Performance has also
been improved. (#440)Thank you to @andrewtek, @gregalvi86, @Ic4ru5Wing, @JoeMarino2021, @jreddy1990, @marvin3FF, @mrcaseb, @RicShern, @SPNE, and @trivialfis for their questions, feedback, and contributions towards this release.
options("nflfastR.raw_directory" = {"your/local/directory"}).
Alternatively, both build_nflfastR_pbp() and
fast_scraper() support the argument dir which
defaults to the above option. (#423)save_raw_pbp() which efficiently
downloads raw play-by-play data and saves it to the local file system.
This serves as a helper to setup the system for faster play-by-play
parsing via the above functionality. (#423)missing_raw_pbp() that computes
a vector of game IDs missing in the local raw play-by-play directory.
(#423)get_pbp_nfl() now uses
ifelse() instead of dplyr::if_else() to handle
some null-checking, fixes bug found in 2022_21_CIN_KC
match.calculate_player_stats() now summarises
target share and air yards share correctly when called with argument
weekly = FALSE (#413)calculate_player_stats() now returns the
opponent team when called with argument weekly = TRUE
(#414)calculate_player_stats_def() no longer
errors when small subsets of pbp data are missing stats. (#415)calculate_series_conversion_rates() no
longer returns NA values if a small subset of pbp data is
missing series on offense or defense. (#417)fixed_drive now correctly increments on plays where
posteam lost a fumble but remains posteam because defteam also lost a
fumble during the same play. (#419)kick_distance on all punts
and kickoffs. (#422)calculate_player_stats() no more counts lost fumbles on
plays where a player fumbles, a team mate recovers and then loses a
fumble to the defense. (#431)passer, receiver, and
rusher no more return NA on “abnormal” plays -
like direct snaps, aborted snaps, laterals etc. - that resulted in a
penalty. (#435)Thank you to @903124, @ak47twq, @andrewtek, @darkhark, @dennisbrookner, @marvin3FF, @mistakia, @mrcaseb, @nicholasmendoza22, @rickstarblazer, @RileyJohnson22, and @tanho63 for their questions, feedback, and contributions towards this release.
calculate_standings() no more freezes when computing
standings from schedules where some games are missing results,
i.e. upcoming games.try() to avoid CRAN problems.
(#404)calculate_standings() wasn’t able to
handle nflverse pbp data. (#404)calculate_player_stats_def() that
aggregates defensive player stats either at game level or overall.
(#288)nflverse_sitrep which is an alias
of the already available report()calculate_player_stats_kicking()
that aggregates player stats for field goals and extra points at game
level or overall. (#381)calculate_series_conversion_rates()
that computes series conversion and series result rates at a game level
or season level. (#393)calculate_player_stats() that
reflects new nflverse data infrastructure.calculate_player_stats() now unifies player names and
joins the following player information via
nflreadr::load_players():
player_display_name - Full name of the playerposition - Position of the playerposition_group - Position group of the playerheadshot_url - URL to a player headshot imageclean_pbp().calculate_player_stats_def() failed in situations where
play-by-play data is missing certain stats. (#382)calculate_player_stats() for
NA names.calculate_standings() that computes
regular season division standings and playoff seeds from nflverse
data.update_db() now supports the
option “nflfastR.dbdirectory” which can be used to set the directory of
the nflfastR pbp database globally and independent of any project
structure or working directories.?teams_colors_logos has been
updated to reflect the most recent team color themes and gained
additional variables for conference and division as well as logo urls to
the conference and league logos. (#290)?teams_colors_logos has been
updated with the Washington Commanders. (#312)qs in the functions
load_pbp() and load_player_stats() has been
deprecated as of nflfastR 4.3.0. This release removes the argument
entirely.calculate_player_stats() in very rare cases caused by plays
with laterals. (#289)add_xpass() failed when
called with an empty data frame. (#296)play_type showed no_play
on plays with penalties that don’t result in a replay of the down.
(#277, #281)total_home_score and total_away_score.
(#300)fast_scraper_rosters() and
fast_scraper_schedules() now call
nflreadr::load_rosters() and
nflreadr::load_schedules() under the hood (#304)update_db() now uses a default play to predefine column
types for all db drivers. (#324)xyac_mean_yardage
on 4th downs (#327)xyac information for
plays involving J.O’Shaughnessy (#329)epa on the last play
of some games involving NE and BUF (#331)fast_scraper() and build_nflfastR_pbp()
now return data frames of class nflverse_data to be
consistent with nflreadr.load_pbp() and
load_player_stats() now call
nflreadr::load_pbp() and
nflreadr::load_player_stats() respectively. Therefore the
argument qs has been deprecated in both functions. It will
be removed in a future release. Running load_player_stats()
without any argument will now return player stats of the current season
only (the default in nflreadr).source and pp in
the functions fast_scraper_*() and
build_nflfastR_pbp() have been removedracr (“Receiver Air Conversion
Ratio”), target_share, air_yards_share,
wopr (“Weighted Opportunity Rating”) and pacr
(“Passing Air Conversion Ratio”) to the output of
calculate_player_stats()report() which will be used by the
maintainers to help users debug their problems (#274).update_db()receiver names (#270)return_team on interception
return touchdowns (#275)wpa variables are NA on end game
linewp variables are 0, 0.5, 1, or NA on
end game linedecode_player_ids() now really decodes the
new variable fantasy_id (#229)wp values
depending on the first game in the data set (#183)sack_yards,
sack_fumbles, rushing_fumbles and
receiving_fumbles to the output of the function
calculate_player_stats(), thanks to Mike Filicicchia (@TheMathNinja).
(#239)calculate_player_stats() falsely
counted lost fumbles on aborted snaps (#238)season_type to the output of
calculate_player_stats() and
load_player_stats() in preparation of the extended Regular
Season starting in 2021 (#240)season_type definitions in preparation of the
extended Regular Season starting in 2021 (#242)fixed_drive where it wasn’t incrementing when
there was a muffed punt followed by timeout (#244)fixed_drive where it wasn’t incrementing
following an interception with the intercepting player then losing a
fumble (#247)safety_player_name and
safety_player_id to the play-by-play data (#252)usethiscalculate_player_stats() that
aggregates official passing, rushing, and receiving stats either at game
level or overallload_player_stats() that loads
weekly player stats from 1999 to the most recent seasonadd_xyac() and
clean_pbp() has been significantly improvedtd_player_name and
td_player_id to clearly identify the player who scored a
touchdown (this is especially helpful for plays with multiple fumbles or
laterals resulting in a touchdown)calculate_player_stats() now adds the
variable dakota, the epa + cpoe
composite, for players with minimum 5 pass attempts.home_opening_kickoff to
clean_pbp()sack_player_id,
sack_player_name, half_sack_1_player_id,
half_sack_1_player_name, half_sack_2_player_id
and half_sack_2_player_name who identify players that
recorded sacks (or half sacks). Also updated the description of the
variables qb_hit_1_player_id,
qb_hit_1_player_name, qb_hit_2_player_id and
qb_hit_2_player_name to make more clear that they did not
record a sack. (#180)qb_scramble was incomplete for the 2005
season because of missing scramble indicators in the play description.
This has been mostly fixed courtesy of charting data from Football
Outsiders (with thanks to Aaron Schatz!). Some notes on this fix: Weeks
1-16 are based on charting. Weeks 17-21 are guesses (basically every QB
run except those that were a) a loss, b) no gain, or c) on 3/4 down with
1-2 to go). Plays nullified by penalty are not included.name, id, rusher, and
rusher_id to be the player charged with the fumble on
aborted snaps when the QB is unable to make a play (i.e. pass, sack, or
scramble) (#162)clean_pbp() now standardizes the team name
columns tackle_with_assist_*_teamdrive that was causing incorrect overtime
win probabilities (#194)posteam was not NA on
end of quarter 2 (or end of quarter 4 in overtime games) causing wrong
values for fixed_drive, fixed_drive_result,
series and series_resultfixed_drive and series
were falsely incrementing on kickoffs recovered by the kicking team or
on defensive touchdowns followed by timeoutsfixed_drive and series
were falsely incrementing on muffed punts recovered by the punting team
for a touchdownadd_xpass() crashed when ran with
data already including xpass variables.epa when a safety is scored by the team
beginning the play in possession of the ball (#186)calculate_player_stats() forgot to
clean player names by using their IDscalculate_player_stats() (#203)update_db() no more falsely closes a
database connection provided by the argument db_connection
(#210)yards_gained was missing yardage on
plays with laterals. (#216)fixed_drive now increments properly on onside kick
recoveries (#215)fixed_drive no longer counts a muffed kickoff as a
one-play drive on its own (#217)fixed_drive now properly increments after a safety
(#219)penalty_type and updated the
description of the variable to make more clear it’s the first penalty
that happened on a play. (#223)source and pp all
across the package. Using them will cause a warning. Parallel processing
has to be activated by choosing an appropriate
future::plan() before calling the relevant functions. For
more information please see the
package documentation.build_nflfastR_pbp() will now run
decode_player_ids() by default (can be deactivated with the
argument decode = FALSE).build_nflfastR_pbp() will now run
add_xpass() by default and add the new variables
xpass and pass_oe.fast_scraper() and
build_nflfastR_pbp() now allow the output of
fast_scraper_schedules() directly as input so it’s not
necessary anymore to pull the game_id first.load_pbp() that loads complete
seasons into memory for fast access of the play-by-play data.rushing_yards,
lateral_rushing_yards, passing_yards,
receiving_yards, lateral_receiving_yards to
fix an old bug where yards_gained gets overwritten on plays
with laterals (#115).vegas_wpa and vegas_home_wpa
which contain Win Probability Added from the spread-adjusted WP
modelout_of_boundsfantasy, fantasy_id,
fantasy_player_name, and fantasy_player_id
that indicate the rusher or receiver on the playtackle_with_assist,
tackle_with_assist_1_player_id,
tackle_with_assist_1_player_name,
tackle_with_assist_1_team,
tackle_with_assist_2_player_id,
tackle_with_assist_2_player_name,
tackle_with_assist_2_teamcalculate_win_probability()vignette("field_descriptions")
with a searchable list of all nflfastR variables?field_descriptions and
?stat_ids to the packagefixed_drive and series
weren’t updating after muffed punt (#144)defteam instead of the posteam (#152)update_db() (added qs
and curl to dependencies)calculate_expected_points() and
calculate_win_probability() duplicated some existing
variables instead of replacing them (#170)penalty_type wasn’t
"no_play" although it should have been (#172)penalty_team could be incorrect in
games of the Jaguars in the seasons 2011 - 2015 (#174)epa on plays
before a failed pass interference challenge in a few 2019 games
(#175)NA on
offsetting penalties (#44)epa when possession team changes at end
of 1st or 3rd quarter (#182)vegas_wp is now NA on final line since
there is no possession teamvegas_wp)yardline_100 as an input to both win probability
models (not having it included was an oversight)series was increased on PATsteam_wordmark - which contains URLs to
the team’s wordmarks - to the included data frame
?teams_colors_logosupdate_db()force_rebuild of the function
update_db() is now of hybrid type. It can rebuild the play
by play data table either for the whole nflfastR era (with
force_rebuild = TRUE) or just for specified seasons
(e.g. force_rebuild = 2019:2020). The latter is intended to
be used for running seasons because the NFL fixes bugs in the play by
play data during the week and we recommend to rebuild the current season
every Thursday.update_db() disconnected the
connection to a database provided by the argument
db_connection (#102)update_db() didn’t build a fresh
database without providing the argument force_rebuildupdate_db() no longer removes the complete data table
when a numeric argument force_rebuild is passed but only
removes the rows within the table (#109)build_nflfastR_pbp(), a
convenient wrapper around multiple nflfastR functions for an easy
creation of the nflfastR play-by-play data setadd_xpass(), that creates columns xpass and
pass_oefixed_drive which was not incrementing
properly on drives that began following a timeoutusethispass = 1)fast (either TRUE or
FALSE) to the function decode_player_ids() to
activate the high efficient C++ decoder of the package gsisdecoderfast_scraper_roster() is finally back! It loads NFL
roster of a given season.decode_player_ids() to decode all
player IDs to the commonly known GSIS ID format (00-00xxxxx)source = "old" to
fast_scraper() to enable scraping of old source. This is
mostly useless as it doesn’t work for 2020 and provides less infodb_connection to
update_db() to allow advanced users to use other DBI
drivers, such as RMariaDB::MariaDB(),
RPostgres::Postgres() or odbc::odbc() (please
see dbplyr for
more information)clean_pbp() now fixes some bugs in jersey numbersclean_pbp(), add_qb_epa() and
add_xyac() can now handle empty data framesfast_scraper() to fail (affects
multiple games of the 2020 season)fixed_drive that counted PAT after defensive
TD as its own driveadd_xyac() breaking with some old packagesadd_xyac() and add_qb_epa()
calculations being wrong for some failed 4th downsvignette("examples") with the new
add_xyac() functionvignette("nflfastR-models")fixed_drive and
fixed_drive_result to the output of
fast_scraper() because the NFL-provided drive info is
extremely buggyseries_resultclean_pbp() now adds 4 new variables
passer_jersey_number, rusher_jersey_number,
receiver_jersey_number and jersey_number.
These can be used to join rosters.timeout_team, return_team,
fumble_recovery_1_team for JAX games from 2011-2015fixed_drive and corrections
to timeout_teamadd_xyac() which adds the following
columns associated with expected yards after the catch (xYAC):
xyac_epa, xyac_success, xyac_fd,
xyac_mean_yardage, xyac_median_yardageseries_success caused by bad
drive information provided by NFLspecial_teams_play, st_play_type,
time_of_day, and order_sequenceold_game_id column (useful for merging to
external data that still uses this ID: format is YYYYMMDDxx)clean_pbp() function now adds an
aborted_play columnplay_type = no_play rather than
passteams_colors_logos for
the interim name of the ‘Washington Football Team’ and the corresponding
logo urls.tidyselect version to be >= 1.1.0clean_pbp() now standardizes player IDs across the old
(1999-2010) and new (2011+) data sources. Player IDs once again uniquely
identify players, and each unique player has one unique ID (as they did
before the NFL data source change):
clean_pbp() now removes all variables it is about to
create to make sure nothing unexpected can happenAdded minimum version requirements to some package dependencies because installation broke for some users with outdated packages
Made a minor bug fix to catch more out-of-order plays and fixed a bug where some plays were being incorrectly dropped in older seasons
Standardized team names (e.g. SD –>
LAC) in some columns we had missed
week from Expected Points models along with an
update of vignette("nflfastR-models") and
vignette("examples")update_db() which adds all completed
games to a SQLite databasecalculate_win_probability()vignette("examples")
demonstrating the usage of the above mentioned functionsdrive_real_start_time pre and post 2011game_ids were overwritten
during the play by play parsingfast_scraper() now loads the raw game data from a
separate raw data repo.data from the rlang package (this is a major code
change that takes some getting used to but we need it in preparation of
a future release)yards_gained more precisely definedvignette("examples") to
demonstrate Expected Points calculator
calculate_expected_points()clean_pbp()first_down_rush
and return_touchdownfast_scraper() for not yet
played gamesxgboost (>=
1.1) as the recent xgboost update caused a breaking change
leading to failure in adding model results to dataAdded new models for Expected Points, Win Probability and
Completion Probability and removed nflscrapR dependency.
This is a major change as we are stepping away from the
well established nflscrapR models. But we believe it is a
good step forward. See data-raw/MODEL-README.md for
detailed model information.
Added internal functions for EPA and
WPA to helper_add_ep_wp.R.
Added new function calculate_expected_points()
usable for the enduser.
Completely overhauled fast_scraper() to make it work
with the NFL’s new server backend. The option source is
still available but will be deprecated since there is only one source
now. There are some changes in the output as well (please see
below).
fast_scraper() now adds game data to the play by
play data set courtesy of Lee Sharpe. Game data include: away_score,
home_score, location, result, total, spread_line, total_line, div_game,
roof, surface, temp, wind, home_coach, away_coach, stadium, stadium_id,
gameday
fastcraper_schedules() now incorporates Lee Sharpe’s
games.rds.
The functions fast_scraper_clips() and
fast_scraper_roster() are deactivated due to the missing
data source. They might be reactivated or completely dropped in future
versions.
The function fix_fumbles() has been renamed to
add_qb_epa() as the new name much better describes what the
function is actually doing.
Added progress information using the
progressrpackage and removed the furrr
progress bars.
clean_pbp() now adds the column ìd
which is the id of the player in the column name. Because
we have to piece together different data to cover the full span of
years, player IDs are not consistent between the early
(1999-2010) and recent (2011 onward) periods.
Added a NEWS.md file to track changes to the
package.
Fixed several bugs inhereted from nflscrapR,
including one where EPA was missing when a play was followed by two
timeouts (for example, a two-minute warning followed by a timeout), and
another where play_type was incorrect on plays with
declined penalties.
Fixed a bug, where receiver_player_name and
receiver didn’t name the correct players on plays with
lateral passes.
The output has changed a little bit.
| Dropped Variables | Description |
|---|---|
| game_key | RS feed game identifier. |
| game_time_local | Kickoff time in local time zone. |
| iso_time | Kickoff time according ISO 8601. |
| game_type | One of ‘REG’, ‘WC’, ‘DIV’, ‘CON’, ‘SB’ indicating if a game was a regular season game or one of the playoff rounds. |
| site_id | RS feed id for game site. |
| site_city | Game site city. |
| site_state | Game site state. |
| drive_possession_team_abbr | Abbreviation of the possession team in a given drive. |
| scoring_team_abbr | Abbreviation of the scoring team if the play was a scoring play. |
| scoring_type | String indicating the scoring type. One of ‘FG’, ‘TD’, ‘PAT’, ‘SFTY’, ‘PAT2’. |
| alert_play_type | String describing the play type of a play the NFL has listed as alert play. For most of those plays there are highlight clips available through fast_scraper_clips. |
| time_of_day | Local time at the beginning of the play. |
| yards | Analogue yards_gained but with the kicking team being the possession team (which means that there are many yards gained through kickoffs and punts). |
| end_yardline_number | Yardline number within the above given side at the end of the given play. |
| end_yardline_side | String indicating the side of the field at the end of the given play. |
| Renamed Variables | Description |
|---|---|
| game_time_eastern -> start_time | Kickoff time in eastern time zone. |
| site_fullname -> stadium | Game site name. |
| drive_how_started -> drive_start_transition | String indicating how the offense got the ball. |
| drive_how_ended -> drive_end_transition | String indicating how the offense lost the ball. |
| drive_start_time -> drive_game_clock_start | Game time at the beginning of a given drive. |
| drive_end_time -> drive_game_clock_end | Game time at the end of a given drive. |
| drive_start_yardline -> drive_start_yard_line | String indicating where a given drive started consisting of team half and yard line number. |
| drive_end_yardline -> drive_end_yard_line | String indicating where a given drive ended consisting of team half and yard line number. |
| roof_type -> roof | One of ‘dome’, ‘outdoors’, ‘closed’, ‘open’ indicating indicating the roof status of the stadium the game was played in. (Source: Pro-Football-Reference) |
| Added Variables | Description |
|---|---|
| vegas_wp | Estimated win probabiity for the posteam given the current situation at the start of the given play, incorporating pre-game Vegas line. |
| vegas_home_wp | Estimated win probability for the home team incorporating pre-game Vegas line. |
| weather | String describing the weather including temperature, humidity and wind (direction and speed). Doesn’t change during the game! |
| nfl_api_id | UUID of the game in the new NFL API. |
| play_clock | Time on the playclock when the ball was snapped. |
| play_deleted | Binary indicator for deleted plays. |
| end_clock_time | Game time at the end of a given play. |
| end_yard_line | String indicating the yardline at the end of the given play consisting of team half and yard line number. |
| drive_real_start_time | Local day time when the drive started (currently not used by the NFL and therefore mostly ‘NA’). |
| drive_ended_with_score | Binary indicator the drive ended with a score. |
| drive_quarter_start | Numeric value indicating in which quarter the given drive has started. |
| drive_quarter_end | Numeric value indicating in which quarter the given drive has ended. |
| drive_play_id_started | Play_id of the first play in the given drive. |
| drive_play_id_ended | Play_id of the last play in the given drive. |
| away_score | Total points scored by the away team. |
| home_score | Total points scored by the home team. |
| location | Either ‘Home’ o ‘Neutral’ indicating if the home team played at home or at a neutral site. |
| result | Equals home_score - away_score and means the game outcome from the perspective of the home team. |
| total | Equals home_score + away_score and means the total points scored in the given game. |
| spread_line | The closing spread line for the game. A positive number means the home team was favored by that many points, a negative number means the away team was favored by that many points. (Source: Pro-Football-Reference) |
| total_line | The closing total line for the game. (Source: Pro-Football-Reference) |
| div_game | Binary indicator for if the given game was a division game. |
| roof | One of ‘dome’, ‘outdoors’, ‘closed’, ‘open’ indicating indicating the roof status of the stadium the game was played in. (Source: Pro-Football-Reference) |
| surface | What type of ground the game was played on. (Source: Pro-Football-Reference) |
| temp | The temperature at the stadium only for ‘roof’ = ‘outdoors’ or ‘open’.(Source: Pro-Football-Reference) |
| wind | The speed of the wind in miles/hour only for ‘roof’ = ‘outdoors’ or ‘open’. (Source: Pro-Football-Reference) |
| home_coach | First and last name of the home team coach. (Source: Pro-Football-Reference) |
| away_coach | First and last name of the away team coach. (Source: Pro-Football-Reference) |
| stadium_id | ID of the stadium the game was played in. (Source: Pro-Football-Reference) |
| game_stadium | Name of the stadium the game was played in. (Source: Pro-Football-Reference) |