Day 1 Activities

Setup

We will send you the file you’ll use, stl_lead.csv (a comma-separated value file) in Slack.

Before you move on, read more about the data here.

Create a new version-controlled R Project named stl-lead-yourinitials (for example, mine would be stl-lead-ah). Remember: there are multiple ways to set up a version controlled project, either through RStudio or starting with a new repo on GitHub then cloning.
Add three subfolders to your R project: data, docs and figs
Copy the data you downloaded above into the data folder of your project
Create and save a new Quarto document as stl_lead_inequity.qmd in the docs folder

In your .qmd:

Attach the tidyverse and janitor packages in a new code chunk
Read in the stl_lead.csv data as stl_lead and use janitor::clean_names to convert all variable names to lower snake case
Do some basic exploration of the dataset (e.g. using summary, data visualizations and summary statistics).
In a new code chunk, from stl_lead create a new data frame called stl_lead_prop that has one additional column called prop_white that returns the percent of each census tract identifying as white (variable white in the dataset divided by variable totalPop, times 100). You may need to do some Googling. Hint: dplyr::mutate(new_col = col_a / col_b) will create a new column new_col that contains the value of col_a / col_b

In a new code chunk, create a scatterplot graph of the percentage of children in each census tract with elevated blood lead levels (pctElevated) versus the percent of each census tract identifying as white.
Customize by updating several aesthetics (e.g. size, opacity (see alpha =), color, etc.)
Store the scatterplot as stl_lead_plot
Have the scatterplot returned in the knitted html - customize the size that it appears when knitted
Also save a .png of the scatterplot to figs, with dimensions of (6” x 5”) (width x height)
In text above or below the scatterplot, write 1 - 2 sentences describing the overall trend that you observe from your graph

Create a histogram of only the pctElevated column in the data frame (remember, this will only take one variable - the frequency is calculated for you by geom_histogram)
Customize the fill, color, and size aesthetics - test some stuff! Feel free to make it awful.
Once you’ve played around with customization, export the histogram as a .jpg to the figs folder
Make sure the histogram also shows up in your rendered html