eds212-day5-comp
(with a ReadMe)tidyverse packagelibrary(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
What does the output look like? A mess. We might not want all of that stuff that R is reporting to show up in my rendered document. Not a problem - we get to define exactly what shows up in our knitted html, either across the entire document or for each specific code chunk.
We can update Quarto execution options to control what shows up (and doesn’t) in our rendered document.
For example:
echo: FALSE will hide source code in the rendered
doc
warning: FALSE will hide warnings /
messages
Setting these execute options in the YAML will make them the default
for all code chunks. You can also override the default using the
hashpipe #| within a code chunk, followed by the execution
option (e.g. #| echo: false ).
starwars dataset (in
dplyr)Use several of the tools we learned yesterday
(e.g. head(), tail(), summary(),
dim(), etc.). Consider this “exploratory” work, that you
would not want to show up at all in a final knitted document. Update the
code chunk header with an option to not including anything from
this code chunk (include = FALSE).
# Return the first 6 lines of `starwars`
head(starwars)
## # A tibble: 6 × 14
##   name      height  mass hair_color skin_color eye_color birth_year sex   gender
##   <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
## 1 Luke Sky…    172    77 blond      fair       blue            19   male  mascu…
## 2 C-3PO        167    75 <NA>       gold       yellow         112   none  mascu…
## 3 R2-D2         96    32 <NA>       white, bl… red             33   none  mascu…
## 4 Darth Va…    202   136 none       white      yellow          41.9 male  mascu…
## 5 Leia Org…    150    49 brown      light      brown           19   fema… femin…
## 6 Owen Lars    178   120 brown, gr… light      blue            52   male  mascu…
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>
# Check the dimensions
dim(starwars)
## [1] 87 14
Update the setup code chunk options so the warnings and messages
are hidden, but the library(tidyverse) code does
still show up in your knitted document. Knit to check.
In a new code chunk, create a ggplot2 graph of
character mass (y-axis) versus height (x-axis). Update so that the
color of the points changes based on the value of mass
(this is unnecessary, but just for customization practice). Update axis
labels (with units). Remember, use ?starwars for more
information. Check the warnings that show up when you run your code.
What are they telling you? Then, update the code chunk option so that
only the graph appears in your knitted document (no code or
warnings / messages). Knit to check.
In a new code chunk, create a histogram of character heights. Update the fill color to purple, and the line color to red (this will look awful - do it anyway for practice). Update the x- and y-axis labels. Update your code chunk options so that only your code and the output graph shows up in the knitted document (no messages or warnings). Knit to check.
ggplot(data = starwars, aes(x = height)) +
  geom_histogram(color = "red", fill = "purple")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 6 rows containing non-finite values (`stat_bin()`).
You can add a figure caption or alt-text to a graphic using
#| fig-cap: "caption text" and
#| fig-alt: "figure alt text"
Here, we’ll learn how to find some summary statistics (mean, standard
deviation, variance). You can refer to a single column in a data frame
using df$colname. For example, if we have a data frame
cats with a column mass, then I can refer to
the mass column in cats using
cats$mass.
Let’s take a look. In the Console, try calling a couple columns
individually from starwars (you don’t need to store these).
E.g. starwars$name, starwars$birth_year,
etc.
We’ll learn a bunch of tools that help to automate finding
summary statistics across multiple columns
(e.g. dplyr::across()) or groups within the same column
(e.g. dplyr::group_by() %>% summarize()), but for now
let’s say we just want a single mean from a single column.
Use the mean() function applied to the column to return
the value.
sw_height_meansw_height_mean <- mean(starwars$height)
Call the value back to yourself (in the Console). What does it tell you the mean height is? Uh oh…
Check out the documentation for mean(). What is the
behavior (default) for dealing with NA (missing)
values?
Update your code so that NA values are
removed, by adding the argument na.rm = TRUE within the
mean() function. Does the value make sense given the
histogram you created above?
sw_height_mean <- mean(starwars$height, na.rm = TRUE)
median()), variance (var()), and standard
deviation (sd()) for Star Wars character heights. Store
them using a consistent naming system as your
sw_height_mean object above. Check all outputs.Let’s say you wanted to report the mean and standard deviation of a variable in text (remember - summary statistics hide things! Always consider accompanying summary statistics with visualizations or tables that show more).
Would you want to manually type the value you found from your code into your Quarto document? Why or why not?
We want our outputs in text to be as reproducible and automatically updated as anything else in our work, so that if anything changes, we aren’t going to be manually (and treacherously) copying and pasting hoping we are updating everything correctly.
Reference stored objects in text by adding inline R code with single
backticks, a lowercase r between them, and then whatever
you want to have show up.
Warning: PAY ATTENTION TO SIGNIFICANT FIGURES. Are you presenting your outcomes at a reasonable level of resolution? Do you need to round your output to make it a responsible reflection of the measurements you have?