R Markdown is a powerful and flexible tool for reproducible data analysis, collaboration, and publication (e.g. papers, blogs, websites, dashboards, CVs) in R or other languages (Python, SQL, and more).
Click HERE for the workshop intro slides.
Review basics of working in R Markdown for data analysis
Customize R Markdown outputs for fun and functionality
A few useful features for research, including:
spin()
and purl()
Note: the only packages we’ll use for this workshop is the tidyverse
package.
library(tidyverse)
We will consider three essential elements of R Markdown in this workshop:
The YAML (“YAML ain’t markup language” or “Yet another markup language”) is the document information stored atop the R Markdown document, bounded on each end by ---
. When you first create a new .Rmd, that will look something like this:
---
title: "Document title"
author: "Author"
date: "MM/DD/YYYY"
output: html_document
---
We’ll leave it as-is for now, but know that this is where we can quickly set a number of options for the document related to output format, themes, layouts, and more.
Most stuff that is outside of the YAML and code chunks in our .Rmd is our text in markdown. We won’t spend a lot of time customizing text, but here are a few commonly used formatting examples:
Use different numbers of pound signs (hashtags) to start a new line, followed by a space then the text, to create headers. If there is one #
, that is a level one (by default, the largest) header. As you increase the number of pound signs to start a line, the header decreases in size.
This written in R Markdown:
### Header 3
…produces a Level 3 (moderately sized) header when knitted.
And we use markdown similarly to update text in other ways, like:
You could manually format all of your text in markdown. OR (drum roll) you could use the shiny new visual editor in RMarkdown (formal release of RStudio 1.4 announced January 2021). If you are running the newest version of RStudio, there is a compass icon in the top right of the .Rmd tab that allows you to switch between the Visual Editor and plain markdown.
Code chunks are where we’ll add most code in an R Markdown document (though we’ll see later we can also add or refer to R code inline). There are a number of ways to add a code chunk, including:
Click on the green + c
button in the top bar for the .Rmd, then choose R
(or explore options for adding code in other languages)
In the menu, click Code > Insert Chunk
Use the shortcut (Cmd + Option + I) << my favorite
Just type in the start and end gates (```{r}
to start, ``` to end)
Within a new code chunk, let’s add some code:
ggplot(data = mtcars, aes(x = wt, y = mpg)) +
geom_point()
When we knit, the code and graph appear in the document, demonstrating that we can have our text, code and outputs all in the same place - and any outputs will automatically update when we change our code & re-knit.
Chunk options are options designated in the code chunk header that determine what appears or does not appear for each chunk upon knitting (and a lot more, but we’ll start there…). For more information on code chunk options in R Markdown, see Chapter 11 in the R Markdown Cookbook.
Chunk options can be added to individual code chunk headers (within the {r}
atop the chunk), or applied globally by adding options to the {r setup}
code chunk at the top of the document.
Here are some common chunk options:
echo = FALSE
: do not show the code in the knitted documentinclude = FALSE
: do not include code or any outputs in knitted documentmessage = FALSE
: suppress messages when knitwarning = FALSE
: suppress warnings when kniteval = FALSE
: do not evaluate this codeFor today, we’ll make changes to the YAML in our .Rmd to fancy our output document. Our customization will be related to our knitted HTML output, using the following:
toc: true
: add a table of contents based on header hierarchiestoc_float: true
: make it a floating TOCnumber_sections: true
: add numbered sections based on header hierarchytheme: _____
: add a bootstrap themecode_folding: show
: code is default shown, but can be hidden if the reader clicks on the ‘Hide’ button created (alternative default is code_folding: hide
)code_download: true
: add a button atop the knitted document to download the .Rmd!For free bootstrap themes, visit: https://bootswatch.com/3/
To add options to the YAML, add them as “children” in the html_document
subsection. Beware of spacing here: generally add 2 spaces of indentation for each new sublevel in the YAML hierarchy.
---
title: "My title"
subtitle: "And subtitle"
author: "Allison Horst"
output:
html_document:
theme: flatly
toc: true
toc_float: true
number_sections: true
code_folding: show
code_download: true
---
Let’s try them out in our R Markdown document!
Often, we want to reference code outcomes within our text. For example, we may want to refer to a mean value, like:
"The mean nitrate concentration in Lake A is 11.2 mg/L."
The common option of copying and pasting values, however, is tedious and dangerous. A better way is to refer to code outputs directly, so that values are updated automatically when code or data change.
In R Markdown, we add code inline with a single back-tick (`) on either end, and a lowercase r between to start it.
Then we can insert code within our text, which under the hood contains this inline R code to calculate the mean gas mileage for all cars in the mtcars
dataset:
r round(mean(mtcars$mpg, na.rm = TRUE), 2)
…to produce this output: “The mean gas mileage of cars in the dataset is 20.09 mm.”
Then, if our data or code ever change, so will our text output.
Sometimes, you’ll want to incorporate something from an R script into an R Markdown document. This might be the case if you have long code, and you might want it to exist in a script instead of cluttering up your .Rmd.
You can script external R scripts like this (from Ch 16.1 of the fantastic R Markdown Cookbook by Yihui Xie, Christophe Dervieux, and Emily Riederer):
source("your-script.R", local = knitr::knit_global())
For example, I can read in my script my_function.R
, which contains a function fruit
, which looks like this:
fruit <- function(apples, bananas, oranges, berries) {
apples + bananas + oranges*berries
}
We can make that available to us in our R Markdown document with:
source("my_function.R", local = knitr::knit_global())
Which means then the function is now available for me to use…
fruit(apples = 2, bananas = 1, oranges = 5, berries = 10)
## [1] 53
spin()
and purl()
Sometimes it can be useful to pull out all code from code chunks in an R Markdown document and condense them into an R script. Or sometimes it’s useful to do the opposite - take information in an R script and add it to an R Markdown document. We can do these with knitr::purl()
and knitr::spin()
, respectively.
knitr::purl()
to aggregate code from code chunks (not inline code) into an R scriptknitr::spin()
to convert an R script to an R Markdown documentpurl()
: R Markdown > R scriptLet’s purl()
the document you’ve been working in to create an R script!
knitr::purl("your_file_name.Rmd")
.You should see a .R script created in the root, with the same name but ending in .R - check it out!
spin()
: R script > MarkdownSee more information on rendering reports from R scripts here. Let’s create a new little R script, and add some information that will be rendered as Markdown syntax.
#' Let's make a markdown document (.md) with `knitr::spin()`.
#'
#' This is normal text
#'
#' ## This is a level 2 header
#'
#' *This text is italicized*
#+
library(tidyverse)
#+
# An actual comment in the code chunk
ggplot(data = starwars, aes(x = height, y = mass)) +
geom_point()
#' Then some normal text outside
Save that .R script (e.g. as starwars_graph.R
).
In the Console, run:
knitr::spin("file_name.R")
Notice that the .md and the knitted HTML are saved! That’s how we can render a report straight from an R script.