The basepenguins package provides tools to convert R
scripts and R Markdown/Quarto documents (or other specified file types)
that use the palmerpenguins
package to use the versions of penguins
and
penguins_raw
from datasets (R ≥
4.5.0).
With R ≥ 4.5.0, the popular Palmer Penguins datasets are now directly available without loading the palmerpenguins package. This makes them more accessible, especially for new R users and for teaching purposes. However, there are some differences between the variable names in the palmerpenguins package and those in R’s datasets package:
palmerpenguins | datasets |
---|---|
bill_length_mm | bill_len |
bill_depth_mm | bill_dep |
flipper_length_mm | flipper_len |
body_mass_g | body_mass |
These shorter variable names in the base R version were chosen for
more compact code and data display. It does mean, however, that for
those wanting to use R’s version of penguins
, it isn’t
simply a case of removing the call to
library(palmerpenguins)
or replacing
palmerpenguins
with datasets
in
data("penguins", package = "palmerpenguins")
and the script
still running.
The basepenguins package takes care of converting
files by removing the call to palmerpenguins and making
the necessary conversions to variable names, ensuring that the resulting
scripts still run using the datasets (R ≥ 4.5.0)
versions of penguins
and penguins_raw
.
The basepenguins package provides four functions to convert files:
convert_files()
: Convert specified files to new output
locationsconvert_files_inplace()
: Convert files in-placeconvert_dir()
: Convert files in a specified directory
and its subdirectories to a new output directory, preserving nesting
structureconvert_dir_inplace()
: Convert files in a directory
in-placeIf using convert_files_inplace()
or
convert_dir_inplace()
, we recommend doing so in conjunction
with a version-control system such as git, so that any changes can be
easily checked.
Additionally, there are helper functions:
example_files()
and example_dir()
: Access
example files included in the packageoutput_paths()
: Generate modified file pathsfiles_to_convert()
: List files in a directory with
specified extensionsWhen a file is ‘convertible’, i.e. contains a call to
library(palmerpenguins)
or
data("penguins", package = "palmerpenguins")
and has one of
the specified extensions (by default "R"
, "r"
,
"qmd"
, "rmd"
, "Rmd"
), the
conversion makes these changes:
library(palmerpenguins)
(or same with
palmerpenguins
in quotes) with the empty
string""
data("penguins", package = "palmerpenguins")
(with any style of quotes) with
data("penguins", package = "datasets")
bill_length_mm
→ bill_len
bill_depth_mm
→ bill_dep
flipper_length_mm
→ flipper_len
body_mass_g
→ body_mass
ends_with("_mm")
with
starts_with("flipper_"), starts_with("bill_")
The package includes an example directory with four example files to
demonstrate how the conversion works. These are accessible through
example_files()
and example_dir()
.
# List all example files
example_files()
#> [1] "nested/not_a_script.md" "nested/penguins.qmd" "no_penguins.Rmd"
#> [4] "penguins.R"
These example files include:
penguins.R
: An R script using the
palmerpenguins packageno_penguins.Rmd
: An Rmarkdown file that includes
ends_with("_mm")
but not in the context of the
palmerpenguins packagenested/penguins.qmd
: A Quarto document using the
palmerpenguins packagenested/not_a_script.md
: Contains
library(palmerpenguins)
, but is not a script type that is
converted by defaultYou can examine the content of any of these files, e.g.:
penguins_script <- example_files("penguins.R")
cat(readLines(penguins_script), sep = "\n")
#> library(palmerpenguins)
#> library(ggplot2)
#> library(dplyr)
#>
#> # exploring scatterplots
#> penguins |>
#> select(body_mass_g, ends_with("_mm")) |>
#> ggplot(aes(x = flipper_length_mm, y = body_mass_g)) +
#> geom_point(aes(color = species, shape = species), size = 2) +
#> scale_color_manual(values = c("darkorange", "darkorchid", "cyan4"))
The example_dir()
function returns the path to the
directory containing all example files. It also has a
copy.dir
argument that allows you to copy all the example
files to a new directory. This is especially useful for testing the
conversion functions that modify files in-place without affecting the
original example files distributed with the package:
# Copy all example files to a new subdirectory of the working directory
example_dir("examples")
# List the files in the copied directory
list.files("examples", recursive = TRUE)
#> [1] "nested/not_a_script.md" "nested/penguins.qmd" "no_penguins.Rmd"
#> [4] "penguins.R"
Note that for the purposes of this vignette (and to adhere to CRAN
policies), the working directory has been set to a tempdir
and all new directories and files are written there, using relative
paths.
The package offers two main approaches to converting files: creating
new converted versions with convert_files()
or modifying
files in place withconvert_files_inplace()
.
Let’s start by converting a single file to see how it works:
# Convert a single file to a new output file
convert_files(penguins_script, "converted_penguins.R")
#> - ends_with("_mm") replaced on line 7 in converted_penguins.R
#> - Please check the changed output files.
# Look at the converted file
cat(readLines("converted_penguins.R"), sep = "\n")
#>
#> library(ggplot2)
#> library(dplyr)
#>
#> # exploring scatterplots
#> penguins |>
#> select(body_mass, starts_with("flipper_"), starts_with("bill_")) |>
#> ggplot(aes(x = flipper_len, y = body_mass)) +
#> geom_point(aes(color = species, shape = species), size = 2) +
#> scale_color_manual(values = c("darkorange", "darkorchid", "cyan4"))
Notice how the function has:
library(palmerpenguins)
lineends_with("_mm")
to use
starts_with()
patterns insteadBoth the input
and output
parameters of
convert_files()
take a vector of file paths, allowing you
to convert multiple files at once.
If you want to overwrite the original files rather than creating new
ones, you can use convert_files_inplace()
, which works
exactly the same as convert_files()
, except that it doesn’t
take an output
argument - it is simply a convenience
wrapper around convert_files(input, input, extensions)
.
All the convert_*()
functions invisibly return a list
with two components:
changed
: Files that were modifiednot_changed
: Files that were not modified (either they
don’t have the specified extensions or they don’t use the
palmerpenguins package)If the output
paths are different than the
input
paths, the values in the changed
and
not_changed
vectors will be subsets of output
,
and they will be named with the corresponding input
paths.
If files are overwritten, then the values in changed
and
not_changed
will be subsets of input
and the
vectors will not be named.
This list is returned invisibly for two reasons:
The convert_*()
functions generate messages in the
following circumstances:
ends_with("_mm")
substitutions are made, a
message with the output file paths and line numbers of those
changesTo convert all convertible files in a directory (and its
subdirectories), use convert_dir()
. We’ll use the
"examples"
directory that we created above with the call to
example_dir("examples")
.
result <- convert_dir("examples", "converted_examples")
#> - ends_with("_mm") replaced on line 7 in converted_examples/penguins.R
#> - Please check the changed output files.
#> - Remember to re-knit or re-render and changed Rmarkdown or Quarto documents.
result
#> $changed
#> examples/nested/penguins.qmd
#> "converted_examples/nested/penguins.qmd"
#> examples/penguins.R
#> "converted_examples/penguins.R"
#>
#> $not_changed
#> examples/no_penguins.Rmd
#> "converted_examples/no_penguins.Rmd"
#> examples/nested/not_a_script.md
#> "converted_examples/nested/not_a_script.md"
To convert all files in a directory in place, use
convert_dir_inplace()
. A useful call is
convert_dir_inplace(".")
to overwrite all convertible files
in the working directory, though we don’t run that here, demonstrating
on a fresh copy of the example directory instead.
When working with large directories, the
files_to_convert()
function helps you find files with
specific extensions that might be candidates for conversion:
# List all files with convertible extensions in a directory
potential_files <- files_to_convert("examples")
potential_files
#> [1] "nested/penguins.qmd" "no_penguins.Rmd" "penguins.R"
It’s important to note that files_to_convert()
only
filters files by their extensions and does not look for
palmerpenguins
in their content.
By default, this function looks for files with extensions
"R"
, "r"
, "qmd"
,
"rmd"
, or "Rmd"
. You can specify different
extensions if needed, or return absolute file paths. See
files_to_convert()
for further details:
When converting files to new locations, the
output_paths()
function helps generate appropriate output
paths, based on the input paths (which are preserved as names). These
can then be passed to the output
argument in
convert_files()
. By default, output_paths()
adds a "_new"
suffix to the file name, but other suffixes,
or prefixes, can be specified. Other output directories can also be
given:
ends_with("_mm")
substitutionThe palmerpenguins
Get started vignette has examples of using
ends_with("_mm")
within calls to
dplyr::select()
, as a convenient way to select the
flipper_length_mm
, bill_length_mm
and
bill_depth_mm
columns.
This pattern presents a design challenge for
basepenguins. We need a way to select the
flipper_len
, bill_len
and
bill_dep
columns.
The most obvious substition for ends_with("_mm")
is
therefore flipper_len, starts_with("bill_")
, which
preserves the use of a tidyselect
function. However, suppose we have a previous call to
dplyr::select()
, and have converted the file with the
above. Then following code will generate an error, because
flipper_len
is no longer available to be selected:
Although the above example is contrived, we don’t want to break
anyone’s code, so instead we replace ends_with("_mm")
with:
This won’t error, even if there are no column names starting with
"flipper_"
or "bill_"
. However, we shouldn’t
ever really need starts_with("flipper_")
as there is only
one column in penguins
that meets that criteria, so we
suggest manually checking this substitution and either replacing
starts_with("flipper_")
with flipper_len
if
flipper_len
is still a column in the data frame, or
removing starts_with("flipper_")
entirely if not.
To facilitate this, the convert_*()
functions all print
a message indicating where these substitutions were made, to help you
manually review and potentially refine these changes if desired.
The use of the ends_with("_mm")
pattern with the
penguins
dataset is also the reason why we only convert
files if library(palmerpenguins)
or
data("penguins", package = "palmerpenguins")
is found in
the file. It is possible to imagine different data frames for which this
selector could be used, and we don’t want to inadvertently alter those.
We provide an example file to demonstrate this:
#> ---
#> title: No penguins
#> ---
#>
#> A file to make sure we're not changing `ends_with("_mm")`
#> if the script doesn't load the palmerpenguins package.
#>
#> ```{r}
#> dat <- data.frame(length_mm = 1:3, depth_mm = 4:6)
#>
#> dat |>
#> dplyr::select(ends_with("_mm"))
#> ```
# Pass it to a convert function
convert_files(no_penguins_file, "no_penguins_converted.Rmd")
# The content doesn't change
cat(readLines("no_penguins_converted.Rmd"), sep = "\n")
#> ---
#> title: No penguins
#> ---
#>
#> A file to make sure we're not changing `ends_with("_mm")`
#> if the script doesn't load the palmerpenguins package.
#>
#> ```{r}
#> dat <- data.frame(length_mm = 1:3, depth_mm = 4:6)
#>
#> dat |>
#> dplyr::select(ends_with("_mm"))
#> ```
Even though this file contains ends_with("_mm")
, and is
an R Markdown file, it doesn’t use the palmerpenguins
package, so no substitutions are made. Notice also that there were no
messages generated when convert_files()
was called,
indicating that none of the input files changed.
The versions of penguins
and penguins_raw
in R ≥ 4.5.0’s datasets package will always (just) have
class data.frame
. In contrast, the
palmerpenguins versions will have classes
tbl_df
, tbl
and data.frame
if the
tibble
package is installed on your computer (and just class
data.frame
if not).
penguins_raw
The versions of penguins_raw
in
palmerpenguins and datasets are
identical, except potentially for their class, as described above. No
specific changes are made to penguins_raw
by the
convert_*()
functions in basepenguins, but
by removing the call to library(palmerpenguins)
, the
datasets version will be used in any scripts, which is
always a data.frame
(never a tbl_df
).
Note that the palmerpenguins package provides features that are not in R, such as vignettes and articles on the package website. The package also contains the data in two csv files and provides a function to access them. And, of course, Allison Horst’s wonderful penguins artwork! The palmerpenguins package will remain on CRAN and keep its package website.
We are extremely grateful to the authors of palmerpenguins, Allison Horst, Alison Hill and Kristen Gorman, for their support for adding the Palmer Penguins data to datasets, and their enthusiasm about basepenguins.