Information
Introduction
This tutorial covers material from Chapter 28 Quarto and Chapter 29 Quarto formats in R for Data Science (2e) by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund, although much of the information from Chapter 28 has already appeared in the “RStudio and GitHub” tutorials in this package. You will practice again the process of creating Quarto websites but at a quicker pace. You will learn some new skills, such as how to add visualization, text to your website and new pages. You will also learn how to save your data-manipulation code in an RDS file and how to save your plots as PNG files.
Website 1
Let’s create and publish a Quarto website as we previously did.
Exercise 1
Create a public GitHub repo (called “website-1”). Make sure to click the “Add a README file” check box. Copy/paste the URL for its GitHub location.
Your answer should look something like:
https://github.com/davidkane9/website-1
Always start a new data science project with a new GitHub repo.
Exercise 2
Connect the website-1
GitHub repo to an R project on
your computer. Name the R project website-1
also. Keeping
the names of repos/projects aligned makes organization simpler.
Remember to “Terminate Jobs” this tutorial.
From the Terminal, run ls
. CP/CR.
Your answer should look something like this:
Davids-MBP:website-1 dkane$ ls
README.md website-1.Rproj
Davids-MBP:website-1 dkane$
Exercise 3
In the Terminal, run quarto create project website .
.
Don’t forget the .
at the end of the command.
Depending on your computer setup, this command will require you to answer some questions. Do so as best you can. If given a choice, choose “rstudio” rather than “vscode.”
This command will usually result in a restart of your RStudio
instance. Or it might create a second R instance, also located in the
website-1
directory. If so, close one of the instances. It
does not matter which.
CP/CR.
Your answer should look like this:
Davids-MacBook-Pro:website-1 dkane$ quarto create project website .
Creating project at /Users/dkane/Desktop/projects/website-1:
- Created _quarto.yml
- Created .gitignore
- Created index.qmd
- Created about.qmd
- Created styles.css
Davids-MacBook-Pro:website-1 dkane$
Quarto is telling you what it did: creating 5 new files and placing
them in the current directory. Recall that, on the command
line, a .
refers to the current directory.
Exercise 4
In the Console, run list.files()
. CP/CR.
Your answer should look like:
> list.files()
[1] "_quarto.yml" "about.qmd" "website-1.Rproj" "index.qmd"
[5] "README.md" "styles.css"
The basic set of files necessary for making a Quarto website has been
added to the current directory. Note how we can work with the current
directory either from the Terminal, with commands like ls
,
or from the Console, with commands like list.files()
. It is
important for you to be comfortable with both approaches, so we intermix
them in this tutorial.
Exercise 5
In the Console, run:
library(tutorial.helpers)
show_file("_quarto.yml")
CP/CR.
The _quarto.yml
controls the creation of the project, in
this case a website.
Exercise 6
In the Source pane, open the _quarto.yml
file. Change
the title
t title: "Website 1"
. Save the file.
The indentation before title
should match that of
navbar
directly below it.
In the Console, run:
show_file("_quarto.yml", start = 4, end = 10)
CP/CR.
Exercise 7
Open the index.qmd
file in the Source pane. Replace the
title
with title: "Homepage"
. Save the
file.
In the Console, run:
show_file("index.qmd")
CP/CR.
Note that the title
("Website 1"
) in
_quarto.yml
has no necessary connection to the
title
("Homepage"
) in index.qmd
.
The former is the title for the entire website. The latter is the title
for just the (yet-to-be-created) index.html
page.
Exercise 8
Add this line to the .gitignore
file:
*Rproj
Ensure that there is a blank line at the bottom of the file. Click Save.
In the Console, run:
show_file(".gitignore")
CP/CR.
*Rproj
removes the website-1.Rproj
file
from the purview of GitHub because, in general, we don’t put “settings”
files on GitHub.
Exercise 9
Commit and push all the files in the project. Run
git log
in the Terminal. CP/CR.
The output from the git log
command is too complex for
us to fully parse. If you want to learn more about how to work with
Git/GitHub and R, check out Happy Git
with R, a very useful resource.
Exercise 10
From the Terminal, run quarto render
. CP/CR.
Your answer should look something like:
Davids-MBP:website-1 dkane$ quarto render
[1/2] index.qmd
[2/2] about.qmd
Output created: _site/index.html
Davids-MBP:website-1 dkane$
Exercise 11
From the Console, run list.files("_site")
. CP/CR.
Your answer will probably look something like:
> list.files("_site")
[1] "about.html" "index.html" "search.json" "site_libs" "styles.css"
>
Exercise 12
Add /_site
to the .gitignore
. Don’t forget
that the last line of a .gitignore
should always be blank.
Save the file.
In the Console, run:
show_file(".gitignore")
CP/CR.
The initial backslash in /_site
allows Git to interpret
_site
correctly. The /
escapes the
_
, allowing it to be treated as a literal, rather than
special, character. See the “Regular expressions” tutorial in this
package for more information. _site
is a directory, so we
could end the expression with a backslash, but Git, like most programs
interpret _site
and _site/
to refer to the
same thing, which they do.
Exercise 13
From the Terminal, run quarto preview
. CP/CR.
Your answer should look something like this:
Davids-MBP:website-1 dkane$ quarto preview
Preparing to preview
[1/1] index.qmd
Watching files for changes
Browse at http://localhost:7412/
GET: /
The quarto preview
command does two things. First, it
renders the files, just like quarto render
does. Second, it
sets up a viewer — probably your default web browser but possibly the
“Viewer” tab in the Output pane — so that you can see what the website
looks like. Click around and check it out!
The message at the end indicates that Quarto is now “watching” your files to see if you make any changes. If you do, it will automatically update the website.
Exercise 14
Ensure that all the files in the project have been committed and pushed.
At the Terminal, type:
quarto publish gh-pages
Copy/paste the URL for your website.
Exercise 15
Commit and push all the files in the project. Run
git log -n 2
in the Terminal. CP/CR.
The -n 2
option causes Git to just provide the
information on the last two commits.
Website 2
Let’s create a new website with following plot on its home page.
Exercise 1
Create a public GitHub repo (called “website-2”). Make sure to click the “Add a README file” check box.
Connect the website-2
GitHub repo to an R project on
your computer. Name the R project website-2
also.
Remember to “Terminate Jobs” this tutorial, if asked. You can always restart it afterwards.
From the Terminal, run ls
. CP/CR.
Recall that the index.qmd
file will always create our
“home” page because it is rendered to index.html
. Browsers,
by default, show index.html
, if one exists, when they visit
a new directory.
Exercise 2
In the Terminal, run
quarto create project website .
.
From the Console, run usethis::use_github_pages()
.
In the Terminal, run ls -a
.
CP/CR.
The .
in the quarto create
command refers
to the current working directory. That is, we are telling Quarto to put
all the necessary files right here.
We are moving faster in the creation of Website 2 than we did when making Website 1 because you should already be familar with the steps. We will go even faster when we make Website 3.
Exercise 3
Add *Rproj
and \_site
to the
.gitignore
files. Save it.
In the Console, run:
show_file(".gitignore")
CP/CR.
Keeping track of all the different files and understanding how they work together is a central skill in creating larger projects.
Exercise 4
Delete everything in index.qmd
except for the
title
, which should be changed to
"My Cool Plot"
, surrounded by the three dashes, above and
below.
In the Console, run:
show_file("index.qmd")
CP/CR.
(Did you remember to library(tutorial.helpers)
first?
Without this, R can’t find he show_file()
command.)
The material bracketed by the three dashes at the very top is called a YAML Header. These files are very sensitive about all sorts of subtle formatting issues.
Exercise 5
Create a new code chunk in index.qmd
and load the
tidyverse package. Save the file.
In the Console, run:
show_file("index.qmd", chunk = "Last")
CP/CR.
There are three ways to do most things associated with project/website creation.
First, we can use RStudio. This is our preferred approach, since it is the easiest for beginning students. Unfortunately, the RStudio interface is somewhat behind in using Quarto effectively.
Second, we can use an R package like quarto directly. RStudio buttons are, generally, connected to R package commands.
Third, we can work with Quarto directly at the Terminal. This is often referred to as working from the command line. R packages often call Terminal commands behind the scenes. Current limitations of the quarto package force us to use this option.
Exercise 6
From the Terminal, run quarto preview
.
Copy and paste the home page, leaving out the menu items.
You answer should look like this:
My Cool Plot
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Because we have not yet added any code chunk options, the code is echoed and the package startup message displayed.
Exercise 7
Create another code chunk. Add this code to it:
diamonds |>
filter(z != 0) |>
filter(color == "D") |>
ggplot(aes(x = carat, y = price, color = cut)) +
geom_point() +
scale_y_continuous(labels = scales::dollar) +
labs(title = "Prices and Sizes for Diamonds of Color D",
subtitle = "'Round' carat values like 1, 1.5 and 2 are more common.",
x = "Carat",
y = "Price")
Save the file.
In the Console, run:
show_file("index.qmd", chunk = "Last")
CP/CR.
Instead of running quarto preview
from the Terminal, we
can also simply render the index.qmd
page by using:
Command/Ctrl + Shift + K
.
This does the same thing as hitting the “Render” button.
You will see a nice plot, along with a lot of junk code and messages.
Note that RStudio might only redirect you to the plot the first time you
render index.qmd
. Going forward, you may need to
affirmatively click on the correct tab on your browser or on the Viewer
tab in the Output pane to see the newly rendered page.
Exercise 8
Add these code chunk options to the top of the first code chunk, the one which loads the library.
#| message: false
#| echo: false
#| label: setup
In the Console, run:
show_file("index.qmd", pattern = "#")
CP/CR.
Code that works during your R session will not work when your website is rendered if you have not loaded the required libraries into the document. But our readers don’t want to see that code, much less the messages which it produces. So, we hide them.
You are generally free to label your chunks however you like, but
there is one chunk name that imbues special behavior in RStudio:
setup
. When you first open a Quarto document, the chunk
named setup
, if there is one, will be run automatically
once, before any other code is run.
Exercise 9
In the second code chunk, add the following options:
#| echo: false
#| label: diamonds-plot
In the Console, run:
show_file("index.qmd", pattern = "#")
CP/CR.
Assigning labels to code chunks helps keeps you organized and helps others understand your code. You cannot assign the same label to multiple code chunks.
Your chunk labels should be short but evocative and should not
contain spaces. We recommend using dashes (-
) to separate
words (instead of underscores, _
) and avoiding other
special characters in chunk labels.
Exercise 10
Render the index.qmd
with the “Render” button. It should
show the plot. As always, you could render with
Command/Ctrl + Shift + K
. Shortcut keys are lit!
In the Console, run:
show_file("index.qmd")
CP/CR.
There are two common modes for working with larger projects. First,
as here, we work with an individual file, like index.qmd
,
and ensure that it is working as we want. Second, as before, we work
with the project as a whole, using commands like
quarto render
, which render all the files in the project.
The former approach is quickest and is how we spend most of our time.
But, at the end of the day, we need to build the entire project.
Exercise 11
It is annoying to use the same code chunk option in every code chunk.
In index.qmd
, both code chunks contain
#| echo: false
. Delete both of them. Edit the YAML header
so that it includes:
execute:
echo: false
In the Console, run:
show_file("index.qmd", end = 5)
CP/CR.
Your answer should look like:
> show_file("index.qmd", end = 5)
---
title: "Website 2"
execute:
echo: false
---
If you added echo: false
as an execute
argument in _quarto.yml
, this would affect all the QMD
files in the entire project. In other words, you can set an option like
echo
in three places: individual code chunk, entire QMD
file, or entire project. Options set at higher levels can be overruled
by changing the setting at a lower level.
Exercise 12
Go to the Git tab. Commit and push your new changes to GitHub. Include a descriptive commit message, e.g. “Added plot to home page”.
In the Terminal, run quarto publish gh-pages
. You may
need to hit “enter/return” once.
Copy/paste the URL below:
Your answer should look like this, but with your user name instead of mine.
https://davidkane9.github.io/website-2/
Keep track of the three different versions of the website.
First, we have the QMD files in the project directory which are used to create the website. This is the original source, the place where changes should be made.
Second, we have the rendered version of the files which reside in the
_site
directory on our computer. We do not edit these files
directly, but we do look at them when we are “previewing” the
website.
Third, we have the website itself, the files which other people can see. These reside on GitHub, in this case, but we could put them on a different hosting service.
Adding text
Quarto allows for a combination of plain text and R code. This same feature allows you to add plain text to your website.
Exercise 1
Open about.qmd
. Delete all the current text, except for
the YAML header. Ensure that the title
is
"About"
. Add ## About Me
as the first (and
only) line of text. Render the QMD.
In the Console, run:
show_file("about.qmd")
CP/CR.
Hash signs in front of text makes the text into a header. The organization of headers is determined by the number of hash signs. Fewer hash signs make higher level headers, i.e. bigger text. More hash signs make lower level headers, i.e. smaller text.
Keep in mind that your browser now has two tabs which show your
website. First, the GitHub Pages version, which has not changed since
you last hit quarto publish
. The only way to affect the
public website is to publish to it. Second, the localhost
version, which is what you have been changing every time that you render
index.qmd
.
Exercise 2
Write a one sentence description of yourself. Write it below the
header. Include your name and school, either the one you attend or one
from which you have graduated. Write until the end of the line, i.e. do
not use the Enter
key to wrap text. When you are finished,
render the QMD. Example:
My name is David Kane, and I graduated from Williams College.
In the Console, run:
show_file("about.qmd")
CP/CR.
You can have a Quarto document without a YAML header, if you like. You can also have an empty YAML header. But, it will often be the case that we want all the files in a project to have similar YAML headers in order to create a consistent look-and-feel.
Exercise 3
Let’s bold your name and italicize your school’s name. In the
description you have already written, surround your name with two
asterisks, **
, on both sides. Let’s also put a single
asterisk, *
, around both sides of your school. Example:
My name is **David Kane**, and I graduated from *Williams College*.
Command/Ctrl + Shift + K
In the Console, run:
show_file("about.qmd")
CP/CR.
Quarto, which is based on the Markdown language, has a variety of formatting options.
Examples of Quarto formatting options incclude:
**bold** *italic* ~~strikeout~~ `code`
superscript^2^ subscript~2~
[underline]{.underline} [small caps]{.smallcaps}
These look like:
bold italic strikeout
code
superscript2 subscript2
underline small caps
Exercise 4
At the bottom of about.qmd
, let’s create text which is
also a hyperlink to a website. Enclose the text “Kane’s Free Data
Science Bootcamp” in brackets []
. Then, without a space
intervening, add parentheses ()
which enclose a link to the
relevant website: https://bootcamp.davidkane.info
.
It should look like:
[Kane's Free Data Science Bootcamp](https://bootcamp.davidkane.info)
Command/Ctrl + Shift + K
. (Recall that rendering
automatically saves the file first.)
In the Console, run:
show_file("about.qmd", pattern = "Kane")
CP/CR.
You only need a few formatting tricks for working with Quarto files, but you will use those tricks repeatedly.
Exercise 5
Push all the changes to Git, and issue the
quarto publish gh-pages
command in the Terminal. You need
to hit “Enter” at least once. CP/CR.
Troubleshooting Quarto documents can be challenging because you are no longer in an interactive R environment, and you will need to learn some new tricks. Additionally, the error could be due to issues with the Quarto document itself or due to the R code in the Quarto document.
If the errors are due to the R code in the document, the first thing you should always try is to recreate the problem in an interactive session. Render the QMD. If you’re lucky, that will recreate the problem, and you can figure out what’s going on interactively.
Adding new pages
Here you will learn how to create new pages for your website, accessible via the navigation bar at the top.
Exercise 1
Go to File -> New File -> Quarto Document...
.
Title it “Sources”. Select “Create Empty Document” (in the bottom left
corner). Save the file, changing the file name from
Untitled.qmd
to sources.qmd
.
In the Terminal, run ls *qmd
. CP/CR.
You should see index.qmd
, about.qmd
, and
sources.qmd
in the output. If not, make sure you are in the
website-1
directory. The *qmd
regular
expression selects only files with names which end with the letters
qmd
.
Exercise 2
In the sources.qmd
file, below the YAML header, add the
text An overview of sources used in my project.
Save the
file.
In the Console, run:
show_file("sources.qmd", pattern = "overview")
CP/CR.
Note how newly created Quarto documents like sources.qmd
differ from that we have previously created in two ways. First, they do
not have a format: html
YAML line in the header. Second,
they feature different default text.
This is an RStudio feature. It knows that this directory is a website
project and that, therefore, it is unnecessary to even mention the
format, which is best controlled from _quarto.yml
.
Exercise 3
Add the following sentence:
These are my sources. Here are some challenges. Quarto websites are awesome.
In the Console, run:
show_file("sources.qmd")
CP/CR.
A website, when you first look closely, may appear overwhelming in its complexity. But it is usually created from of a simple collection of text files, each of them quite simple.
You may be directed to the localhost
version of your
website, showing the “Sources” page. You have rendered the QMD into
HTML. But notice that there is no way to get to this page via the
website menu . . .
Exercise 4
Open the _quarto.yml
file. You should see a line that
says navbar
. Below this line you should see the line
left
. Below this is the code that creates the header links
for the navigation bar on your site. It should look like:
website:
title: "Website 2"
navbar:
left:
- href: index.qmd
text: Home
- about.qmd
To add sources.qmd
as a new page which will appear on
the menu, we need to edit this as follows:
website:
title: "Website 2"
navbar:
left:
- href: index.qmd
text: Home
- about.qmd
- href: sources.qmd
text: Sources
Note the two additional lines.
The href
line tells Quarto to create an HTML page by
rendering sources.qmd
and to include that new page in the
website.
The text
line tells Quarto to format the link in the
navigation toolbar as
Sources
. This is the same process by which the home page,
created from index.qmd
, has a link labeled
Home
.
Since - about.qmd
appears without either the
href
or text
argument, Quarto assumes that you
want the link to be the same as the title of the QMD.
In the Console, run:
show_file("_quarto.yml")
CP/CR.
YAML files are very tricky! Tabs and white
spaces matter. The code left
before the navbar items puts
the navigation bar on the left side of your site. This can be changed to
`right``.
Exercise 5
Run quarto preview
from the Terminal. You will need to
hit “Enter” once. Check out your new project! See how “Sources” now
appears on the navigation toolbar. Click on “Sources” and see how the
text you added to sources.qmd
now appears on the web.
Run ls _site/*html
in the Terminal. CP/CR.
You should see several HTML pages, including
sources.html
. Quarto websites, like many similar
frameworks, place all the pages which appear on the web into a single
directory, like _site
, and then move that directory onto a
hosting service like GitHub Pages.
Exercise 6
In the _quarto.yml
file, delete the two lines of code
that create the link and header for the “Sources” page.
In the Terminal, stop the preview by hitting the stop sign button.
Then, use rm
to remove sources.qmd
. Run
ls *qmd
in the Terminal. CP/CR.
There should not be a sources.qmd
file. We are
practicing these skills to make you more comfortable building up and
tearing down the parts of a project. You need to build more than one
house to get good at house-building!
Exercise 7
Make two more empty Quarto documents, in the same way as you did
above, with titles "Source 1"
and
"Source 2"
.
You should save them both (so they’re not untitled1
and
untitled2
). Name them source-1.qmd
and
source-2.qmd
.
In the Terminal, run the command
head source-1.qmd source-2.qmd
. CP/CR.
In many Terminal commands, you can provide more than one file as the
target of the command. In this case, head
will be applied
to source-1.qmd
and then source-2.qmd
sequentially.
Exercise 8
Let’s create a drop-down menu called “Sources” through which one could visit either the page for “Source 1” or “Source 2” from the top menu.
We will be messing around with YAML settings to make this, so pay very close attention
The Quarto documentation provides an excellent overview of approaches to website navigation.
Change _quarto.yml
so that the navbar
section looks like this
navbar:
left:
- href: index.qmd
text: Home
- text: Sources
menu:
- href: source-1.qmd
text: Source 1
- href: source-2.qmd
text: Source 2
- about.qmd
In the Console, run:
show_file("_quarto.yml")
CP/CR.
Run quarto preview
to see the changes.
Exercise 9
Go to the Git tab, then commit and push your new changes to GitHub. Include a descriptive commit message, e.g. “Added source pages”.
From the Terminal, run git log -n 1
. CP/CR.
(You will probably need to stop the preview. But, if you want to be tricky, you can start a new Terminal by clicking on the small triangle next to the Terminal triangle — below the Terminal tab in the Console pane — and selecting “New Terminal”.)
Type quarto publish
in the Terminal and hit “Enter.”
Explore the use of the “Sources” menu on your website.
Website 3
Let’s make one more website using the same repo/project/quarto approach as above. We will also learn about storing R objects in RDS files.
Exercise 1
Create a public GitHub repo (called “website-3”). Make sure to click the “Add a README file” check box.
Connect the website-3
GitHub repo to an R project on
your computer. Name the R project website-3
also.
In the Terminal, run
quarto create project website .
.
From the Console, run usethis::use_github_pages()
.
In the Terminal, run ls -a
.
CP/CR.
This is the fourth time you have created a Github Pages website in the two Quarto Websites tutorials in this package.
Exercise 2
Add *Rproj
and \_site
to the
.gitignore
files.
Edit the _quarto.yml
file to set the title
to "My Website"
.
Edit the index.qmd
file so that only the
title
remains. Set the title
equal to
"My Home Page"
.
Save all files. Commit and push everything. From the Terminal, run
git log
. CP/CR.
Again, the title
in _quarto.yml
determines
the title of the entire website. The title
at the top of
index.qmd
specifies the title of just that page.
Exercise 3
From the Terminal, run quarto preview
to examine the
website. You should see the titles that you have specified. If you
don’t, fix things.
From the Terminal, run quarto publish gh-pages
to
publish the website.
Copy/paste the URL for your website.
Behind the scenes, usethis::use_github_pages()
has
created a second branch
for your repo, named gh-pages
. “Branching means you diverge
from the main line of development and continue to do work without
messing with that main line.”
The details of branching are beyond the scope of this package.
Exercise 4
Delete all the text in index.qmd
. Replace it with
(mostly) the same text as from Website 2.
---
title: "My Home Page"
execute:
echo: false
---
```{r}
#| message: false
#| label: setup
library(tidyverse)
```
```{r}
#| label: diamonds-plot
diamonds |>
filter(z != 0) |>
filter(color == "D") |>
ggplot(aes(x = carat, y = price, color = cut)) +
geom_point() +
scale_y_continuous(labels = scales::dollar) +
labs(title = "Prices and Sizes for Diamonds of Color D",
subtitle = "'Round' carat values like 1, 1.5 and 2 are more common.",
x = "Carat",
y = "Price")
```
From the Console, run
show_file("index.qmd", end = 5)
CP/CR.
One problem with this approach is that all this code is run every time someone refreshes the homepage. That is wasteful, especially if the code takes a while to run.
Using an RDS file
An RDS file allows us to save any R object in a file. When we read from an RDS, we avoid having to run the code which created the object again. Instead, we simply have access to the object. This is very convenient for any code which takes a long time to run, such as intensive data manipulation.
Restart your R session with Command/Ctrl + Shift + 0
. As
always, this will require you to “Terminate Jobs” your tutorial.
Exercise 1
Go to File -> New File -> R Script
. Save the file.
Name the file process_data.R
. The .R
suffix
will be added automatically.
In the Terminal, run ls
. CP/CR.
Make sure that process_data.R
is one of these files. We
often use scripts to record the exact steps we used in analyzing some
data or in creating a graphic. We want our work to be reproducible. We
will often need to perform the same work again, albeit with some
changes. A script can make that much simpler.
Exercise 2
Though an R script will use the libraries you already have loaded if executed interactively in the Console, it is good practice to include these at the top of the script. Any script you create should run correctly, even if you have just re-started R.
In process_data.R
, load the tidyverse
package using the library()
command.
In the Console, run:
show_file("process_data.R")
CP/CR.
Because this is a script, you don’t need to create a code chunk. Scripts are most useful for work which simply creates an object, often a new data set or a graphic.
Exercise 3
Skip a line after library(tidyverse)
. Again, this is a
script, so there are no code chunks, but skipping a line in between
substantively different parts of the code is a good idea. Add this
code:
diamonds |>
filter(z != 0) |>
filter(color == "D")
In the Console, run:
show_file("process_data.R")
CP/CR.
The answer should look like this:
library(tidyverse)
diamonds |>
filter(z != 0) |>
filter(color == "D")
You should try to learn how to work in two ways. First is interactively, trying things at the Console. Seeing what happens. Going line-by-line. Second is writtenly — yes, I just made up this word — adding code to your script/QMD and then executing all the code at once.
Of course, the best approach is to use these two methods during your analysis, depending on which is best for the next step you need to take.
Exercise 4
Place your cursor in process_data.R
at
filter(z != 0) |>
. Hit
Command/Ctrl + Enter
. This will cause the pipe to be pasted
to the Console and then executed there. (Note that this should generate
an error.) CP/CR.
The cause of the error is the distinction between the two worlds: the
world of the Console and the world of the Script. The Script includes
library(tidyverse)
. So, when you source
process_data.R
in the next question, everything works fine.
The Console, however, knows nothing about the diamonds
tibble or about commands like select()
or
filter()
. Recall that we recently restarted R. So, the
tidyverse library, which contains diamonds
and those commands, has not been loaded in the Console.
A line of code, like library(tidyverse)
, might exist in
a script, but unless/until that line is executed in the Console, the
Console won’t know anything about it.
Exercise 5
“Source” the entire script by pressing the “Source” button on the upper right. This takes all the code from the script and feeds it, line-by-line, to the R Console. CP/CR.
Your answer should look like this:
> source("~/Desktop/projects/website-3/process_data.R", echo=TRUE)
>
> library(tidyverse)
── Attaching core tidyverse packages ───────────────────────────────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ─────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package to force all conflicts to become errors
> diamonds |>
+ filter(z != 0) |>
+ filter(color == "D")
A tibble: 6,774 × 10
carat cut color clarity depth table price x y z
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.23 Very Good D VS2 60.5 61 357 3.96 3.97 2.4
2 0.23 Very Good D VS1 61.9 58 402 3.92 3.96 2.44
3 0.26 Very Good D VS2 60.8 59 403 4.13 4.16 2.52
4 0.26 Good D VS2 65.2 56 403 3.99 4.02 2.61
5 0.26 Good D VS1 58.4 63 403 4.19 4.24 2.46
6 0.22 Premium D VS2 59.3 62 404 3.91 3.88 2.31
7 0.3 Premium D SI1 62.6 59 552 4.23 4.27 2.66
8 0.3 Ideal D SI1 62.5 57 552 4.29 4.32 2.69
9 0.3 Ideal D SI1 62.1 56 552 4.3 4.33 2.68
10 0.24 Very Good D VVS1 61.5 60 553 3.97 4 2.45
ℹ 6,764 more rows
ℹ Use `print(n = ...)` to see more rows
>
The output of the pipe is our new tibble with 6,764 rows.
Exercise 6
Change the code so that the pipe, instead of just vomiting out its
result for printing, assigns that result to a new object, named
D_diamonds
. Your script should look like:
library(tidyverse)
D_diamonds <- diamonds |>
filter(z != 0) |>
filter(color == "D")
Source the new script. CP/CR.
Your answer should look like this:
> source("~/Desktop/projects/website-3/process_data.R", echo=TRUE)
> library(tidyverse)
> D_diamonds <- diamonds |>
+ filter(z != 0) |>
+ filter(color == "D")
>
Notice how different this looks from the previous output.
First, none of the messages associated with the tidyverse package are displayed. R only shows these messages the first time you load the package in a session.
Second, the rows of the tibble are not shown because the pipe no
longer just vomits them out at the end. Instead, the results of the pipe
are assigned to a new object: D_diamonds
. Assignment does
not result in printed output.
Exercise 7
From the Console, run ls()
. CP/CR.
Your answer should look like this:
> ls()
[1] "D_diamonds"
>
We have created a new object in the environment called
D_diamonds
. You can also see this object if you click on
the Environment tab within the Environment pane. The name
D_diamonds
is meant to remind us that all the diamonds in
this tibble are of color “D.”
Exercise 8
Restart your R session again. You can never restart your R session
too often! The best way to do this is with the shortcut command:
Command/Ctrl + Shift + 0
. (This is the number zero, not the
letter “O.”)
From the Console, run ls()
. CP/CR.
Your answer should look like:
> ls()
character(0)
>
D_diamonds
has disappeared! Objects in the R environment
vanish once R restarts. And that is OK! We have the script,
process_data.R
, which allows us to recreate the object
whenever we want.
Exercise 9
Source process_data.R
. From the Console, run
ls()
. CP/CR.
D_diamonds
is back! If you type D_diamonds
at the Console and hit “Enter,” the contents of the tibble will
appear.
Exercise 10
It is often useful to save a permanent copy of an R object like a
plot or a tibble. write_rds()
from the readr package is the most common approach for
tibbles.
From the Console, run
write_rds(D_diamonds, file = "clean_data.rds")
. This
function does not produce any output. The first argument is the R
object(s) which you want to save. The second argument is the name of the
file which will hold that object.
From the Console, run list.files(pattern = "clean")
.
CP/CR.
Your answer should look like:
> list.files(pattern = "clean")
[1] "clean_data.rds"
>
The pattern
argument to list.files()
allows
us to only look for files which match a regular expression.
There are now three objects associated with our tibble:
The script,
process_data.R
, which we used to create it.The R object,
D_diamonds
, which only exists in the R session in which it was created.The RDS object,
clean_data.rds
, which contains the data and which continues to exist even after the R session ends. Indeed, the RDS object can be committed and pushed to GitHub. (But be careful of pushing large objects (greater than 100 megabytes) to GitHub.)
Exercise 11
Restart your R session with Command/Ctrl + Shift + 0
.
When R restarts, you should still be in the website-3
project. At the Console, run library(tidyverse)
. We need to
reload the tidyverse every time we restart R.
At the Console, run read_rds("clean_data.rds")
.
CP/CR.
Your answer should look like:
> read_rds("clean_data.rds")
A tibble: 6,774 × 10
carat cut color clarity depth table price x y z
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.23 Very Good D VS2 60.5 61 357 3.96 3.97 2.4
2 0.23 Very Good D VS1 61.9 58 402 3.92 3.96 2.44
3 0.26 Very Good D VS2 60.8 59 403 4.13 4.16 2.52
4 0.26 Good D VS2 65.2 56 403 3.99 4.02 2.61
5 0.26 Good D VS1 58.4 63 403 4.19 4.24 2.46
6 0.22 Premium D VS2 59.3 62 404 3.91 3.88 2.31
7 0.3 Premium D SI1 62.6 59 552 4.23 4.27 2.66
8 0.3 Ideal D SI1 62.5 57 552 4.29 4.32 2.69
9 0.3 Ideal D SI1 62.1 56 552 4.3 4.33 2.68
10 0.24 Very Good D VVS1 61.5 60 553 3.97 4 2.45
ℹ 6,764 more rows
ℹ Use `print(n = ...)` to see more rows
>
The value returned by read_rds()
is whatever is
contained in the RDS object. In this case, that is a tibble. But, when
the object at the Console is a tibble, the contents of the tibble are
just printed out, as here.
Exercise 12
Open the process_data
.R` script. At the bottom, skip a
line after the pipe and then add this line (the same one we just
executed at the Console):
write_rds(D_diamonds, file = "clean_data.rds")
It is separate from, and unconnected to, the pipe which creates the
D_diamonds
tibble.
In the Console, run:
show_file("process_data.R")
CP/CR.
This is a common approach to scripts which are parts of larger projects. They finish by saving an object which is then used by other code in the project.
Exercise 13
Source process_data.R
. The easiest way of doing so, as
before, is to open the file in the Source pane and then click the
“Source” button.
In the Terminal, run ls clean*
. CP/CR.
Your answer should look like:
Davids-MBP:website-3 dkane$ ls clean*
clean_data.rds
Davids-MBP:website-3 dkane$
clean*
is a regular expression which matches any file
which begins with the letters clean
.
Think about the overall workflow.
First, we worked interactively, just playing around with plots in
index.qmd
, trying to figure out what works.Second, once we had a good idea of what we were doing, we moved to a script,
process_data.R
. We don’t want that code to be re-run every time we updateindex.qmd
. So, we move some of the work to a script, to be run interactively whenever we need to change something.Third, we changed the script so that its final output was a permanent object. Later parts of the process can just begin with
clean_data.rds
.
Using a PNG file
A PNG file allows us to save any R graphic in a file. When we use a PNG, we avoid having to re-run the code which created it. This is very convenient for graphics which take a long time to create.
Exercise 1
Go to File -> New File -> R Script
. Press save in
the left of the top panel. Name the file make_plot.R
. The
.R
suffix will be added automatically.
In the Terminal, run ls
. CP/CR.
Make sure that make_plot.R
is one of these files. We
often use scripts to record the exact steps we used in creating a
graphic. We want our work to be reproducible. We will often need to
perform the same work again, albeit with some small changes. A script
can make that much simpler.
Exercise 2
Though an R script will use the libraries you already have loaded if executed interactively in the Console, it is good practice to include these at the top of the script. Any script you create should run correctly, even if you have just re-started R.
In the R script, load the tidyverse package using
the library()
command.
In the Console, run:
show_file("make_plot.R")
CP/CR.
Naming conventions are important. It is fine to have dashes in a code chunk label. However, any object which might be used from the Terminal, like the name of a script, should avoid dashes because the Terminal interprets a dash as a special character, not necessarily as part of the file name.
Exercise 3
Copy this code into make_plot.R
, after skipping a line
after library(tidyverse)
. Save the R script.
x <- read_rds("clean_data.rds")
x |>
ggplot(aes(x = carat, y = price, color = cut)) +
geom_point() +
scale_y_continuous(labels = scales::dollar) +
labs(title = "Prices and Sizes for Diamonds of Color D",
subtitle = "'Round' carat values like 1, 1.5 and 2 are more common.",
x = "Carat",
y = "Price")
In the Console, run:
show_file("make_plot.R")
CP/CR.
The code does two things.
First, it reads the tibble from clean_data.rds
and
assigns that tibble to x
. (The fact that, in our
process_data.R
script, this data was named
D_diamonds
is irrelevant. write_rds()
just
saves the tibble, not the name of the tibble.)
Second, it starts a new pipe with x
and then creates our
plot. Because the plot is not assigned to an object, it is just spat
back out, and then shows up in the Plots tab.
Exercise 4
Source the entire script by pressing the “Source” button. This takes all the code from the script and feeds it, line-by-line, to the R Console. CP/CR.
Your answer should look like:
> source("~/Desktop/projects/website-3/make_plot.R", echo=TRUE)
> library(tidyverse)
> x <- read_rds("clean_data.rds")
> x |>
+ ggplot(aes(x = carat, y = price, color = cut)) +
+ geom_point() +
+ scale_y_continuous(labels = scales::dollar) +
+ labs(titl .... [TRUNCATED]
>
At this point, we face the same problem as we faced when working on
process_data.R
. The script creates the object, in this case
a graphic, which we want. But we need a way to save a permanent copy of
that object.
Exercise 5
Edit make_plot.R
in two ways. First, assign the plot to
an object named D_plot
. Second, insert, after the graphic
pipe, ggsave("size_v_weight.png", D_plot)
.
In the Console, run:
show_file("make_plot.R")
CP/CR.
As with clean_data.R
, we have created a script which
encapsulates, from start-to-finish, a discrete part of our analysis. In
this case, we start from the tibble saved in clean_data.rds
and finish by saving a graphic in size_v_weight.png
.
Exercise 6
Source make_plot.R
. From the Console, run
list.files()
. CP/CR.
There are a lot of files! Note the new script
make_plot.R
, the object with which it starts,
clean_data.rds
, and the object which it creates,
size_v_weight.png
.
Exercise 7
Open the index.qmd
file. In the setup
code
chunk, add library(knitr)
. In the second code chunk, delete
all the plotting code. Replace it all with
knitr::include_graphics("size_v_weight.png")
. Save the
file.
In the Console, run:
show_file("index.qmd")
CP/CR.
Your answer should look like this:
Davids-MBP:first-website dkane$ cat index.qmd
---
title: "My First Website"
execute:
echo: false
---
```{r}
#| message: false
#| label: setup
library(tidyverse)
library(knitr)
```
```{r}
#| label: diamonds-plot
knitr::include_graphics("size_v_weight.png")
```
Davids-MBP:first-website dkane$
There was nothing wrong, per se with having the plotting
code in index.qmd
rather than make_plot.R
. The
problem arises when that plotting code takes a long time to run, as it
often will. We don’t want to re-run all the code each time we make a
small change to the website. It is much better to separate out the
plotting code and only run it when we need to.
Exercise 8
From the Terminal, run quarto publish gh-pages
. You will
need to hit “Enter” at least once. CP/CR.
Note the workflow. At the start, we created a pipe in
index.qmd
which does everything and then, later, we moved
the code into separate scripts so that we don’t need to run all the
analysis each time we render the website. This is a very common
approach. First, get something working. Second, organize and clean it
all up.
In a complex project, we will often create directories — like
code
, data
, and images
— into
which we will place files like process_data.R
,
clean_data.rds
, and size_v_weight.png
,
respectively.
Exercise 9
Look up the help page for theme_minimal()
by running
?theme_minimal
at the Console. Copy-and-paste the
Description.
Graphics on the web often appear differently than graphics on your
computer, so many
people find it useful to increase the size of the text in their
graphics. A simple
way to do that is to add theme_minimal(base_size = 18)
to your plot creation code. Fortunately, doing so is as simple as
editing and then re-running make_plot.R
.
Exercise 10
Look up the help page for ggsave()
. Copy-and-paste the
Description.
When saving graphics for web display, it is often useful to specify
their exact size with a command like
ggsave("your-file.png", your-plot, width = 4, height = 3)
.
An earlier edition of ggplot2:
Elegant Graphics for Data Analysis also notes
that:
For raster graphics (i.e. .png, .jpg), the dpi argument controls the resolution of the plot. It defaults to 300, which is appropriate for most printers, but you may want to use 600 for particularly high-resolution output, or 96 for on-screen (e.g., web) display.
Summary
This tutorial covered material from Chapter 28 Quarto and Chapter 29 Quarto formats in R for Data Science (2e) by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund, although much of the information from Chapter 28 has already appeared in the “RStudio and GitHub” tutorials in this package. You have practiced again the process of creating Quarto websites but at a quicker pace. You learned some new skills, such as how to add visualization, text to your website and new pages. You also learned how to save your data-manipulation code in an RDS file and how to save your plots as PNG files.
Download answers
- Click the button to download a file containing your answers.
- Save the file onto your computer in a convenient location.
(If no file seems to download, try clicking with the alternative button on the download button and choose "Save link as...")