R Markdown publishing options. Source: RStudio and Alison Hill R Markdown resources (illustration by Allison Horst).

1 Introduction

R Markdown is a powerful and flexible tool for reproducible data analysis, collaboration, and publication (e.g. papers, blogs, websites, dashboards, CVs) in R or other languages (Python, SQL, and more).

Click HERE for the workshop intro slides.

1.1 Workshop outline

  1. Review basics of working in R Markdown for data analysis

  2. Customize R Markdown outputs for fun and functionality

  3. Explore useful features for research, including:

    • Reference code outputs in text for reproducible and efficient reporting
    • Sourcing external scripts in an Rmd
    • Caching time consuming code chunks
    • Adding a bibliography
    • From R Markdown to R scripts and back with spin() and purl()

2 Demo

Note: the only packages we’ll use for this workshop are the tidyverse and palmerpenguins packages:

library(tidyverse)
library(palmerpenguins)

2.1 R Markdown essentials

We will consider three essential elements of R Markdown in this workshop:

  • YAML
  • Markdown text
  • Code chunks

2.1.1 YAML

The YAML (“YAML ain’t markup language” or “Yet another markup language”) is the document information stored atop the R Markdown document, bounded on each end by ---. When you first create a new .Rmd, that will look something like this:

---
title: "Document title"
author: "Author"
date: "MM/DD/YYYY"
output: html_document
---

We’ll leave it as-is for now, but know that this is where we can quickly set a number of options for the document related to output format, themes, layouts, and more.

2.1.2 Markdown text

Most stuff that is outside of the YAML and code chunks in our .Rmd is our text in markdown. We won’t spend a lot of time customizing text, but here are a few commonly used formatting examples:

Use different numbers of pound signs (hashtags) to start a new line, followed by a space then the text, to create headers. If there is one #, that is a level one (by default, the largest) header. As you increase the number of pound signs to start a line, the header decreases in size.

This written in R Markdown:

### Header 3

…produces a Level 3 (moderately sized) header when knitted.

And we use markdown similarly to update text in other ways, like:

  • A single asterisk on either side of text makes it italic
  • A double asterisk on either side of text makes it bold
  • Use a hat (^) on the end of text to make asuperscript
  • Or an tilde (~) on each end to make asubscript

You could manually format all of your text in markdown. OR (drum roll) you could use the shiny new visual editor in RMarkdown (formal release of RStudio 1.4 announced January 2021). If you are running the newest version of RStudio, there is a compass icon in the top right of the .Rmd tab that allows you to switch between the Visual Editor and plain markdown.

2.1.3 Code chunks

Code chunks are where we’ll add most code in an R Markdown document (though we’ll see later we can also add or refer to R code inline). There are a number of ways to add a code chunk, including:

  • Click on the green + c button in the top bar for the .Rmd, then choose R (or explore options for adding code in other languages)

  • In the menu, click Code > Insert Chunk

  • Use the shortcut (Cmd + Option + I) << my favorite

  • Just type in the start and end gates (```{r} to start, ``` to end)

Within a new code chunk, let’s add some code:

ggplot(data = penguins, aes(x = species, y = body_mass_g)) +
  geom_jitter()

When we knit, the code and graph appear in the document, demonstrating that we can have our text, code and outputs all in the same place - and any outputs will automatically update when we change our code & re-knit.

2.1.4 Aside: chunk options

Chunk options are options designated in the code chunk header that determine what appears or does not appear for each chunk upon knitting (and a lot more, but we’ll start there…). For more information on code chunk options in R Markdown, see Chapter 11 in the R Markdown Cookbook.

Chunk options can be added to individual code chunk headers (within the {r} atop the chunk), or applied globally by adding options to the {r setup} code chunk at the top of the document.

Here are some common chunk options:

  • echo = FALSE: do not show the code in the knitted document
  • include = FALSE: do not include code or any outputs in knitted document
  • message = FALSE: suppress messages when knit
  • warning = FALSE: suppress warnings when knit
  • eval = FALSE: do not evaluate this code

2.2 R Markdown customization

For today, we’ll make changes to the YAML in our .Rmd to fancy our output document. Our customization will be related to our knitted HTML output, using the following:

  • toc: true: add a table of contents based on header hierarchies
  • toc_float: true: make it a floating TOC
  • number_sections: true: add numbered sections based on header hierarchy
  • theme: _____: add a bootstrap theme
  • code_folding: hide: code is default hidden, but available if the reader clicks on the ‘Code’ button created

For free bootstrap themes, visit: https://bootswatch.com/3/

To add options to the YAML, add them as “children” in the html_document subsection. Beware of spacing here: generally add 2 spaces of indentation for each new sublevel in the YAML hierarchy.

---
title: "Level up in R Markdown"
subtitle: "UCSB QMSS seminar series (February 2021)"
author: "Allison Horst"
output: 
  html_document:
    theme: flatly
    toc: true
    toc_float: true
    number_sections: true
    code_folding: hide
---

Let’s try them out in our R Markdown document!

2.3 R Markdown for research

2.3.1 Inline code

Often, we want to reference code outcomes within our text. For example, we may want to refer to a mean value, like:

  "The mean nitrate concentration in Lake A is 11.2 mg/L." 

The common option of copying and pasting values, however, is tedious and dangerous. A better way is to refer to code outputs directly, so that values are updated automatically when code or data change.

In R Markdown, we add code inline with a single back-tick (`) on either end, and a lowercase r between to start it.

Then we can insert code within our text, which under the hood contains this inline R code to calculate the mean flipper length for all penguins:

r round(mean(penguins$flipper_length_mm, na.rm = TRUE), 2)

…to produce this output: “The mean flipper length for all penguins in the dataset is 200.92 mm.”

Then, if our data or code ever change, so will our text output.

2.3.2 Sourcing scripts

Sometimes, you’ll want to incorporate something from an R script into an R Markdown document. This might be the case if you have long code, and you might want it to exist in a script instead of cluttering up your .Rmd.

You can script external R scripts like this (from Ch 16.1 of the fantastic R Markdown Cookbook by Yihui Xie, Christophe Dervieux, and Emily Riederer):

source("your-script.R", local = knitr::knit_global())

For example, I can read in my script my_function.R, which contains a function fruit, which looks like this:

fruit <- function(apples, bananas, oranges, berries) {
  apples + bananas + oranges*berries
}

We can make that available to us in our R Markdown document with:

source("my_function.R", local = knitr::knit_global())

Which means then the function is now available for me to use…

fruit(apples = 2, bananas = 1, oranges = 5, berries = 10)
## [1] 53

2.3.3 Caching code chunks

If you have a code chunk that is really time consuming (e.g. reading in big datasets, running simulations, etc.), then you may want to cache that code chunk. That means that when you knit your .Rmd, instead of running every code chunk from the top down, it will avoid running a cached code chunk if there are no changes from the previously knit version.

For the full story, see Ch. 11.4 in the (R Markdown Cookbook)[https://bookdown.org/yihui/rmarkdown-cookbook/cache.html].

For example, if I have some code in the following chunk that takes a while to run (this doesn’t, but let’s imagine it does), then cache = TRUE is added in the code chunk header to cache outputs if no changes are made since it was last knit.

```{r, cache = TRUE}
ggplot(data = diamonds, aes(x = carat, y = price)) +
  geom_point()
```

2.3.4 Bibliographies

You can add citations and a bibliography to your document by adding the references file (a .bib document) to the R Markdown YAML (see Ch. 4.5 in the R Markdown Cookbook for more information and bibliography options).

The YAML can be updated to enable citations using bibliography:, for example if my bibliography file is saved as refs_file.bib in the project root:

title: "Doc title"
author: "My name"
output: html_document
  theme: darkly
  toc: true
bibliography: refs_file.bib

Then we can add citations within our .Rmd document, from the R Markdown Cookbook: “Items can be cited directly within the documentation using the syntax (key?) where key is the citation key in the first line of the entry, e.g., (R-base?). To put citations in parentheses, use (key?). To cite multiple entries, separate the keys by semicolons, e.g., (key-1?; key-2?; key-3?). To suppress the mention of the author, add a minus sign before @, e.g., (R-base?).”

For example, if in my refs_file.bib I have a citation stored like this:

@Book{rmarkdown_cookbook,
    title = {R Markdown Cookbook},
    author = {Yihui Xie and Christophe Dervieux and Emily Riederer},
    publisher = {Chapman and Hall/CRC},
    address = {Boca Raton, Florida},
    year = {2020},
    note = {ISBN 9780367563837},
    url = {https://bookdown.org/yihui/rmarkdown-cookbook},
  }

To cite that in my R Markdown text, here’s what different formats look like (all using that citation key):

@rmarkdown_cookbook

…makes a citation like this: Xie, Dervieux, and Riederer (2020)

[@rmarkdown_cookbook]

…makes a citation like this: (Xie, Dervieux, and Riederer 2020)

Or combine citations, separating them by a semi-colon:

[@rmarkdown_cookbook; @rmarkdown_guide; @tidyverse_2019]

Produces: (Xie, Dervieux, and Riederer 2020; Xie, Allaire, and Grolemund 2018; Wickham et al. 2019)

And there are a bunch of other options to explore re: citations and bibliographies (including changing the citation format automatically to match required styles for different journals!).

The citations in full appear at the end of the knitted document.

Note: If the journal you’re publishing to has a Citation Style Language (and .csl), you can add that to the YAML to have your bibliography updated accordingly! See more here.

2.3.5 From R Markdown to R scripts & back with spin() and purl()

Sometimes it can be useful to pull out all code from code chunks in an R Markdown document and condense them into an R script. Or sometimes it’s useful to do the opposite - take information in an R script and add it to an R Markdown document. We can do these with knitr::purl() and knitr::spin(), respectively.

  • Use knitr::purl() to aggregate code from code chunks (not inline code) into an R script
  • Use knitr::spin() to convert an R script to an R Markdown document

2.3.5.1 purl(): R Markdown > R script

Let’s purl() the document you’ve been working in to create an R script!

  • In the R Console, run knitr::purl("your_file_name.Rmd").

You should see a .R script created in the root, with the same name but ending in .R - check it out!

2.3.5.2 spin(): R script > Markdown

See more information on rendering reports from R scripts here. Let’s create a new little R script, and add some information that will be rendered as Markdown syntax.

  • Create a new empty .R script, then copy and paste the following code & annotation into it:
#' Let's make a markdown document (.md) with `knitr::spin()`.
#'
#' This is normal text
#'
#' ## This is a level 2 header
#'
#' *This text is italicized*

#+ 
library(tidyverse)

#+
# An actual comment in the code chunk
ggplot(data = starwars, aes(x = height, y = mass)) +
  geom_point()

#' Then some normal text outside
  • Save that .R script (e.g. as starwars_graph.R).

  • In the Console, run:

knitr::spin("file_name.R")

Notice that the .md and the knitted HTML are saved! That’s how we can render a report straight from an R script.

3 Resources

References

Horst, Allison Marie, Alison Presmanes Hill, and Kristen B Gorman. 2020. Palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data. https://allisonhorst.github.io/palmerpenguins/.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.
Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbook. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook.