eds212-day5-comp
(with a ReadMe)tidyverse
packagelibrary(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
What does the output look like? A mess. We might not want all of that stuff that R is reporting to show up in my rendered document. Not a problem - we get to define exactly what shows up in our knitted html, either across the entire document or for each specific code chunk.
We can update Quarto execution options to control what shows up (and doesn’t) in our rendered document.
For example:
echo: FALSE
will hide source code in the rendered
doc
warning: FALSE
will hide warnings /
messages
Setting these execute options in the YAML will make them the default
for all code chunks. You can also override the default using the
hashpipe #|
within a code chunk, followed by the execution
option (e.g. #| echo: false
).
starwars
dataset (in
dplyr
)Use several of the tools we learned yesterday
(e.g. head()
, tail()
, summary()
,
dim()
, etc.). Consider this “exploratory” work, that you
would not want to show up at all in a final knitted document. Update the
code chunk header with an option to not including anything from
this code chunk (include = FALSE
).
# Return the first 6 lines of `starwars`
head(starwars)
## # A tibble: 6 × 14
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Luke Sky… 172 77 blond fair blue 19 male mascu…
## 2 C-3PO 167 75 <NA> gold yellow 112 none mascu…
## 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…
## 4 Darth Va… 202 136 none white yellow 41.9 male mascu…
## 5 Leia Org… 150 49 brown light brown 19 fema… femin…
## 6 Owen Lars 178 120 brown, gr… light blue 52 male mascu…
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
# Check the dimensions
dim(starwars)
## [1] 87 14
Update the setup code chunk options so the warnings and messages
are hidden, but the library(tidyverse)
code does
still show up in your knitted document. Knit to check.
In a new code chunk, create a ggplot2
graph of
character mass (y-axis) versus height (x-axis). Update so that the
color of the points changes based on the value of mass
(this is unnecessary, but just for customization practice). Update axis
labels (with units). Remember, use ?starwars
for more
information. Check the warnings that show up when you run your code.
What are they telling you? Then, update the code chunk option so that
only the graph appears in your knitted document (no code or
warnings / messages). Knit to check.
In a new code chunk, create a histogram of character heights. Update the fill color to purple, and the line color to red (this will look awful - do it anyway for practice). Update the x- and y-axis labels. Update your code chunk options so that only your code and the output graph shows up in the knitted document (no messages or warnings). Knit to check.
ggplot(data = starwars, aes(x = height)) +
geom_histogram(color = "red", fill = "purple")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 6 rows containing non-finite values (`stat_bin()`).
Here, we’ll learn how to find some summary statistics (mean, standard
deviation, variance). You can refer to a single column in a data frame
using df$colname
. For example, if we have a data frame
cats
with a column mass
, then I can refer to
the mass
column in cats
using
cats$mass
.
Let’s take a look. In the Console, try calling a couple columns
individually from starwars
(you don’t need to store these).
E.g. starwars$name
, starwars$birth_year
,
etc.
We’ll learn a bunch of tools that help to automate finding
summary statistics across multiple columns
(e.g. dplyr::across()
) or groups within the same column
(e.g. dplyr::group_by() %>% summarize()
), but for now
let’s say we just want a single mean from a single column.
Use the mean()
function applied to the column to return
the value.
sw_height_mean
sw_height_mean <- mean(starwars$height)
Call the value back to yourself (in the Console). What does it tell you the mean height is? Uh oh…
Check out the documentation for mean()
. What is the
behavior (default) for dealing with NA
(missing)
values?
Update your code so that NA
values are
removed, by adding the argument na.rm = TRUE
within the
mean()
function. Does the value make sense given the
histogram you created above?
sw_height_mean <- mean(starwars$height, na.rm = TRUE)
median()
), variance (var()
), and standard
deviation (sd()
) for Star Wars character heights. Store
them using a consistent naming system as your
sw_height_mean
object above. Check all outputs.Let’s say you wanted to report the mean and standard deviation of a variable in text (remember - summary statistics hide things! Always consider accompanying summary statistics with visualizations or tables that show more).
Would you want to manually type the value you found from your code into your Quarto document? Why or why not?
We want our outputs in text to be as reproducible and automatically updated as anything else in our work, so that if anything changes, we aren’t going to be manually (and treacherously) copying and pasting hoping we are updating everything correctly.
Reference stored objects in text by adding inline R code with single
backticks, a lowercase r
between them, and then whatever
you want to have show up.
Warning: PAY ATTENTION TO SIGNIFICANT FIGURES. Are you presenting your outcomes at a reasonable level of resolution? Do you need to round your output to make it a responsible reflection of the measurements you have?