Task 1: Read Broman & Woo’s Data organization in spreadsheets

Take ~15 minutes to read Broman & Woo’s evergreen paper Data organization in spreadsheets. As you read, think about data that you have created or had to work with that did not follow these guidelines. Make notes of examples to share from several - how did you input data previously? How would you change the way you input data?

Questions:

  • What are major / most common ways you have seen these guidelines ignored?

  • What is your experience working with or creating data in spreadsheets that don’t follow these guidelines?

Task 2: SBC Lobsters

Data source: Santa Barbara Coastal LTER, D. Reed, and R. Miller. 2021. SBC LTER: Reef: Abundance, size and fishing effort for California Spiny Lobster (Panulirus interruptus), ongoing since 2012 ver 6. Environmental Data Initiative. https://doi.org/10.6073/pasta/0bcdc7e8b22b8f2c1801085e8ca24d59

Getting started

  • Create a new GitHub repo called eds221-day6-activities
  • Clone to create a version controlled R project
  • Add subfolders data and docs
  • Download the California Spiny lobster abundance data from this SBC LTER data package. Familiarize yourself with the metadata. Save the CSV containing lobster abundance data in your data subfolder.
  • In docs, create a new .Rmd or .qmd saved with file prefix lobster_exploration
  • Within your notebook, write organized and well-annotated code to do the following:
    • Read in and take a look at the data in the data/Lobster_Abundance_All_Years_20210412.csv file. Take note of values that can be considered NA (see metadata) and update your import line to convert those to NA values
    • Convert column names to lower snake case
    • Convert the data from frequency to case format using dplyr::uncount() on the existing count column. What did this do? Add annotation in your code explaining dplyr::uncount()

Here’s code to read in your data, just to get your started:

lobsters <- read_csv(here("data","Lobster_Abundance_All_Years_20210412.csv"), na = c("-99999", "")) %>% 
  clean_names() %>% 
  uncount(count)

Find counts and mean sizes by site & year

  • Create a summary table that finds the total counts (see: n()), and mean carapace lengths of lobsters observed in the dataset by site and year.
  • Create a ggplot graph of the number of total lobsters observed (y-axis) by year (x-axis) in the study, grouped (either aesthetically or by faceting) by site

Task 3: Random lobster wrangling

Starting with the original lobsters data that you read in as lobsters, complete the following (separately - these are not expected to be done in sequence or anything). You can store each of the outputs as ex_a, ex_b, etc. for the purposes of this task.

filter() practice

  1. Create and store a subset that only contains lobsters from sites “IVEE”, “CARP” and “NAPL”. Check your output data frame to ensure that only those three sites exist.

  2. Create a subset that only contains lobsters observed in August.

  3. Create a subset with lobsters at Arroyo Quemado (AQUE) OR with a carapace length greater than 70 mm.

  4. Create a subset that does NOT include observations from Naples Reef (NAPL)

group_by() %>% summarize() practice

  1. Find the mean and standard deviation of lobster carapace length, grouped by site.

  2. Find the maximum carapace length by site and month.

mutate() practice

  1. Add a new column that contains lobster carapace length converted to centimeters. Check output.

  2. Update the site column to all lowercase. Check output.

  3. Convert the area column to a character (not sure why you’d want to do this, but try it anyway). Check output.

case_when() practice

  1. Use case_when() to add a new column called size_bin that contains “small” if carapace size is <= 70 mm, or “large” if it is greater than 70 mm. Check output.

  2. Use case_when() to add a new column called designation that contains “MPA” if the site is “IVEE” or “NAPL”, and “not MPA” for all other outcomes.