Setup

  • Create a new repo on GitHub called eds221-day7-activities
  • Clone to make a version controlled R Project
  • Add subfolders data, R and figs
  • Familiarize yourself with the contents, data files, and variables from this data package on EDI
  • Download the entire Zip Archive for the package
  • Copy all 4 files to your data folder

Task 1: Joins on birds

In this section, you’ll test and explore a number of different joins.

  • Create a new .qmd in your R folder saved as bird_joins.qmd
  • Read in the data sets and store the data frames as bird_observations, sites, surveys, and taxalist (it should be clear from the raw file names which is which)
  • Create a subset of bird_observations called birds_subset that only contains observations for birds with species id “BHCO” and “RWBL”, and from sites with site ID “LI-W” and “NU-C”

Left join practice

  • Use left join(s) to update birds_subset so that it also includes sites and taxalist information. For each join, include an explicit argument saying which variable you are joining by (even if it will just assume the correct one for you). Store the updated data frame as birds_left. Make sure to look at the output - is what it contains consistent with what you expected it to contain?

Full join practice

  • First, answer: what do you expect a full_join() between birds_subset and sites to contain?

  • Write code to full_join the birds_subset and sites data into a new object called birds_full. Explicitly include the variable you’re joining by. Look at the output. Is it what you expected?

Task 2: Data wrangling and visualization with birds

Continue in your same .qmd that you created for Task 1

  • Starting with your birds object, rename the notes column to bird_obs_notes (so this doesn’t conflict with notes in the surveys dataset

  • Then, create a subset that contains all observations in the birds dataset, joins the taxonomic, site and survey information to it, and is finally limited to only columns survey_date, common_name, park_name, and bird_count. You can decide the order that you want to create this in (e.g. limit the columns first, then join, or the other way around).

  • Use lubridate::month() to add a new column called survey_month, containing only the month number. Then, convert the month number to a factor (again within mutate())

  • Learn a new function on your own! Use dplyr::relocate() to move the new survey_month column to immediately after the survey_date column. You can do this in a separate code chunk, or pipe straight into it from your existing code.

  • Find the total number of birds observed by park and month (i.e., you’ll group_by(park_name, survey_month))

  • Filter to only include parks “Lindo”, “Orme”, “Palomino” and “Sonrisa”

Task 3: Practice with strings

  • Create a new .qmd in your R folder called string_practice.qmd
  • Copy all contents of the html table below to your clipboard:
date building alarm_message
2020-03-14 Engineering-North 10:02am – HVAC system down, facilities management alerted
2020-03-15 Bren Hall 8:24am – Elevator North out of service
2020-04-10 Engineering-South 12:41am – Fire alarm, UCSB fire responded and cleared
2020-04-18 Engr-North 9:58pm – Campus point emergency siren, UCPD responded
  • Back in your string_practice.Rmd, create a new code chunk

  • With your cursor in your code chunk, go up to Addins in the top bar of RStudio. From the drop-down menu, choose ‘Paste as data frame’. Make sure to add code to store the data frame as alarm_report

  • Practice working with strings by writing code to update alarm_report as follows (these can be separate, or all as part of a piped sequence):

    • Replace the “Engr” with “Engineering” in the building column
    • Separate the building column into two separate columns, building and wing, separated at the dash
    • Only keep observations with the word “responded” in the alarm_message column
    • Separate the message time from the rest of the message by separating at --
    • Convert the date column to a Date class using lubridate

End