Setup

Get the data

We will send you the file you’ll use, stl_lead.csv (a comma-separated value file) in Slack.

Before you move on, read more about the data here.

Create a version-controlled R Project

  • Create a new version-controlled R Project named stl-lead-yourinitials (for example, mine would be stl-lead-ah). Remember: there are multiple ways to set up a version controlled project, either through RStudio or starting with a new repo on GitHub then cloning.
  • Add three subfolders to your R project: data, docs and figs
  • Copy the data you downloaded above into the data folder of your project
  • Create and save a new Quarto document as stl_lead_inequity.qmd in the docs folder

Read in & explore the data

In your .qmd:

  • Attach the tidyverse and janitor packages in a new code chunk

  • Read in the stl_lead.csv data as stl_lead and use janitor::clean_names to convert all variable names to lower snake case

  • Do some basic exploration of the dataset (e.g. using summary, data visualizations and summary statistics).

  • In a new code chunk, from stl_lead create a new data frame called stl_lead_prop that has one additional column called prop_white that returns the percent of each census tract identifying as white (variable white in the dataset divided by variable totalPop, times 100). You may need to do some Googling. Hint: dplyr::mutate(new_col = col_a / col_b) will create a new column new_col that contains the value of col_a / col_b

Create a scatterplot

  • In a new code chunk, create a scatterplot graph of the percentage of children in each census tract with elevated blood lead levels (pctElevated) versus the percent of each census tract identifying as white.
  • Customize by updating several aesthetics (e.g. size, opacity (see alpha =), color, etc.)
  • Store the scatterplot as stl_lead_plot
  • Have the scatterplot returned in the knitted html - customize the size that it appears when knitted
  • Also save a .png of the scatterplot to figs, with dimensions of (6” x 5”) (width x height)
  • In text above or below the scatterplot, write 1 - 2 sentences describing the overall trend that you observe from your graph

Create a histogram

  • Create a histogram of only the pctElevated column in the data frame (remember, this will only take one variable - the frequency is calculated for you by geom_histogram)
  • Customize the fill, color, and size aesthetics - test some stuff! Feel free to make it awful.
  • Once you’ve played around with customization, export the histogram as a .jpg to the figs folder
  • Make sure the histogram also shows up in your rendered html

Render & push

  • Render your .qmd
  • Stage, commit, pull then push changes using the command line

Share & test

  • Share the link to your public GitHub repo with a neighbor
  • Fork the repo your neighbor shared, then clone to get set up locally in RStudio
  • Navigate to their .qmd
  • Render - does everything work? Cool, your neighbor made a reproducible project with file paths you can use too!

End activity