Part 1: Setup

  • Create a repo on GitHub called eds221-day2-comp
  • Clone to make a version-controlled R Project
  • Create a new R Markdown, saved in the root as r-data-types
  • Create a new Jupyter Notebook, saved in the root as py-data-types

Part 2: Making & indexing data in R

Vectors!

Making vectors

A character vector

dogs <- c("teddy", "khora", "waffle", "banjo")

typeof(dogs)
## [1] "character"
class(dogs)
## [1] "character"

A numeric vector

weights <- c(50, 55, 25, 35)

typeof(weights) # Hmmm what is different about this and the line below?
## [1] "double"
class(weights)
## [1] "numeric"

An integer vector

dog_age <- c(5L, 6L, 1L, 7L)

typeof(dog_age)
## [1] "integer"
class(dog_age)
## [1] "integer"
# Check with a logical: 
is.numeric(dog_age)
## [1] TRUE

What if we combine classes?

There is a hierarchy of classes. The broadest of all in a vector wins (if there are characters, then character will be the class of the entire vector).

dog_info <- c("teddy", 50, 5L)
dog_info
## [1] "teddy" "50"    "5"
typeof(dog_info)
## [1] "character"
class(dog_info)
## [1] "character"
is.character(dog_info)
## [1] TRUE

Named elements

dog_food <- c(teddy = "purina", khora = "alpo", waffle = "fancy feast", banjo = "blue diamond")
dog_food
##          teddy          khora         waffle          banjo 
##       "purina"         "alpo"  "fancy feast" "blue diamond"
class(dog_food)
## [1] "character"
typeof(dog_food)
## [1] "character"

Accessing bits of vectors

Use [] with the position or name to access elements of a vector.

dog_food[2]
##  khora 
## "alpo"
dog_food["khora"]
##  khora 
## "alpo"

Or we can specify a range of values within a vector using [:]. The first element in R vectors is assigned element = 1. This is an important distinction. In Python, the first element is assigned 0 (zero-index).

# Create a vector of car colors observed
cars <- c("red", "orange", "white", "blue", "green", "silver", "black")

# Access just the 5th element
cars[5]
## [1] "green"
# Access elements 2 through 4
cars[2:4]
## [1] "orange" "white"  "blue"

A warm-up for for loops:

i <- 4
cars[i]
## [1] "blue"
i <- seq(1:3)
cars[i]
## [1] "red"    "orange" "white"

And we can update elements of a vector directly (mutable):

cars[3] <- "BURRITOS!"
cars
## [1] "red"       "orange"    "BURRITOS!" "blue"      "green"     "silver"   
## [7] "black"

Matrices!

Creating matrices

(…we did some of this in EDS 212 too!)

fish_size <- matrix(c(0.8, 1.2, 0.4, 0.9), ncol = 2, nrow = 2, byrow = FALSE)

fish_size
##      [,1] [,2]
## [1,]  0.8  0.4
## [2,]  1.2  0.9
typeof(fish_size) # Returns the class of values
## [1] "double"
class(fish_size) # Returns matrix / array
## [1] "matrix" "array"

What happens if we try to combine multiple data types into a matrix?

dog_walk <- matrix(c("teddy", 5, "khora", 10), ncol = 2, nrow = 2, byrow = FALSE)

dog_walk
##      [,1]    [,2]   
## [1,] "teddy" "khora"
## [2,] "5"     "10"
class(dog_walk)
## [1] "matrix" "array"
typeof(dog_walk)
## [1] "character"
# Hmmmmmm once again back to the broadest category of data type in the hierarchy

Accessing pieces of matrices

Index using [row, column].

whale_travel <- matrix(data = c(31.8, 1348, 46.9, 1587), nrow = 2, ncol = 2, byrow = TRUE)

# Take a look
whale_travel
##      [,1] [,2]
## [1,] 31.8 1348
## [2,] 46.9 1587
# Access the value 1348
whale_travel[1,2] # Row 1, column 2
## [1] 1348
# Access the value 46.9
whale_travel[2,1]
## [1] 46.9

If you leave any element blank (keeping the comma), it will return all values from the other element. For example, to get everything in row 2:

whale_travel[2,]
## [1]   46.9 1587.0

Or, to access everything in column 1:

whale_travel[, 1]
## [1] 31.8 46.9

What happens if I only give a matrix one element? That’s the position in the matrix as if populated by column. Check out a few:

whale_travel[3]
## [1] 1348

Lists

urchins <- list("blue", c(1, 2, 3), c("a cat", "a dog"), 5L)

urchins
## [[1]]
## [1] "blue"
## 
## [[2]]
## [1] 1 2 3
## 
## [[3]]
## [1] "a cat" "a dog"
## 
## [[4]]
## [1] 5

Accessing pieces of a list

Important: a single [] returns a list. [[]] returns the item STORED in the list.

urchins[[2]]
## [1] 1 2 3
# Compare that to: 
urchins[2]
## [[1]]
## [1] 1 2 3

Naming list items? Sure thing!

tacos <- list(topping = c("onion", "cilantro", "guacamole"), filling = c("beans", "meat", "veggie"), price = c(6.75, 8.25, 9.50))

# The whole thing
tacos
## $topping
## [1] "onion"     "cilantro"  "guacamole"
## 
## $filling
## [1] "beans"  "meat"   "veggie"
## 
## $price
## [1] 6.75 8.25 9.50
# Just get one piece of it: 
tacos[[2]]
## [1] "beans"  "meat"   "veggie"
#...or, the same thing:
tacos$filling
## [1] "beans"  "meat"   "veggie"

Data frames

A data frame is a list containing vectors of the same length, where each column is a variable stored in a vector. Let’s make one:

fruit <- data.frame(type = c("apple", "banana", "peach"), 
                    mass = c(130, 195, 150))

# Look at it
fruit
##     type mass
## 1  apple  130
## 2 banana  195
## 3  peach  150
# Check the class
class(fruit)
## [1] "data.frame"

Access elements from a data frame

Use [row#, col#], or name the column (then element number).

fruit[1,2]
## [1] 130
fruit[3,1]
## [1] "peach"
fruit[2,1] <- "pineapple"
fruit
##        type mass
## 1     apple  130
## 2 pineapple  195
## 3     peach  150

Part 3: Making & indexing data in Python

import numpy as np
import pandas as pd

Vectors and matrices in Python

teddy = [1,2,8]
teddy_vec = np.array(teddy)

teddy_vec
## array([1, 2, 8])
type(teddy_vec)
## <class 'numpy.ndarray'>

A list is mutable - you can change it directly!

teddy[1] = 1000

# See that element 1 is updated directly! 
teddy
## [1, 1000, 8]

A tuple is immutable - you’ll get yelled at if you try to change it!

khora = (1, 5, 12)
type(khora)

# khora[1] = 16 # Nope. 
## <class 'tuple'>

A more involved list (note: you can also use list() to create lists in python).

waffle = [["cat", "dog", "penguin"], 2, "a burrito", [1,2,5]]

waffle

# Access an element from the list waffle:
## [['cat', 'dog', 'penguin'], 2, 'a burrito', [1, 2, 5]]
waffle[0] # Default just returns that piece (not as a list)
## ['cat', 'dog', 'penguin']

We can reassign pieces of a list:

waffle[1] = "AN EXTRAVAGANZA"

waffle
## [['cat', 'dog', 'penguin'], 'AN EXTRAVAGANZA', 'a burrito', [1, 2, 5]]

Make a pandas DataFrame in python

First, a dictionary example:

fox = {'sound': ["screech", "squeal", "bark"], 'age': [2, 6, 10]}

fox['sound']
## ['screech', 'squeal', 'bark']
fox['age']
## [2, 6, 10]
cows = {'name': ["moo", "spots", "happy"], 'location': ["pasture", "prairie", "barn"], 'height': [5.7, 5.4, 4.9]}

cows_df = pd.DataFrame(cows)

# Take a look
cows_df

# Get a column
##     name location  height
## 0    moo  pasture     5.7
## 1  spots  prairie     5.4
## 2  happy     barn     4.9
cows_df['name']

# Get an element using df.at[]
## 0      moo
## 1    spots
## 2    happy
## Name: name, dtype: object
cows_df.at[1, 'name']
## 'spots'

End (for now…)