Part 1: Setup

Create a repo on GitHub called eds221-day2-comp
Clone to make a version-controlled R Project
Create a new R Markdown, saved in the root as r-data-types
Create a new Jupyter Notebook, saved in the root as py-data-types

Part 2: Making & indexing data in R

Vectors!

Making vectors

A character vector

dogs <- c("teddy", "khora", "waffle", "banjo")

typeof(dogs)

## [1] "character"

class(dogs)

## [1] "character"

A numeric vector

weights <- c(50, 55, 25, 35)

typeof(weights) # Hmmm what is different about this and the line below?

## [1] "double"

class(weights)

## [1] "numeric"

An integer vector

dog_age <- c(5L, 6L, 1L, 7L)

typeof(dog_age)

## [1] "integer"

class(dog_age)

## [1] "integer"

# Check with a logical: 
is.numeric(dog_age)

## [1] TRUE

What if we combine classes?

There is a hierarchy of classes. The broadest of all in a vector wins (if there are characters, then character will be the class of the entire vector).

dog_info <- c("teddy", 50, 5L)
dog_info

## [1] "teddy" "50"    "5"

typeof(dog_info)

## [1] "character"

class(dog_info)

## [1] "character"

is.character(dog_info)

## [1] TRUE

Named elements

dog_food <- c(teddy = "purina", khora = "alpo", waffle = "fancy feast", banjo = "blue diamond")
dog_food

##          teddy          khora         waffle          banjo 
##       "purina"         "alpo"  "fancy feast" "blue diamond"

class(dog_food)

## [1] "character"

typeof(dog_food)

## [1] "character"

Accessing bits of vectors

Use [] with the position or name to access elements of a vector.

dog_food[2]

##  khora 
## "alpo"

dog_food["khora"]

##  khora 
## "alpo"

Or we can specify a range of values within a vector using [:]. The first element in R vectors is assigned element = 1. This is an important distinction. In Python, the first element is assigned 0 (zero-index).

# Create a vector of car colors observed
cars <- c("red", "orange", "white", "blue", "green", "silver", "black")

# Access just the 5th element
cars[5]

## [1] "green"

# Access elements 2 through 4
cars[2:4]

## [1] "orange" "white"  "blue"

A warm-up for for loops:

i <- 4
cars[i]

## [1] "blue"

i <- seq(1:3)
cars[i]

## [1] "red"    "orange" "white"

And we can update elements of a vector directly (mutable):

cars[3] <- "BURRITOS!"
cars

## [1] "red"       "orange"    "BURRITOS!" "blue"      "green"     "silver"   
## [7] "black"

Matrices!

Creating matrices

(…we did some of this in EDS 212 too!)

fish_size <- matrix(c(0.8, 1.2, 0.4, 0.9), ncol = 2, nrow = 2, byrow = FALSE)

fish_size

##      [,1] [,2]
## [1,]  0.8  0.4
## [2,]  1.2  0.9

typeof(fish_size) # Returns the class of values

## [1] "double"

class(fish_size) # Returns matrix / array

## [1] "matrix" "array"

What happens if we try to combine multiple data types into a matrix?

dog_walk <- matrix(c("teddy", 5, "khora", 10), ncol = 2, nrow = 2, byrow = FALSE)

dog_walk

##      [,1]    [,2]   
## [1,] "teddy" "khora"
## [2,] "5"     "10"

class(dog_walk)

## [1] "matrix" "array"

typeof(dog_walk)

## [1] "character"

# Hmmmmmm once again back to the broadest category of data type in the hierarchy

Accessing pieces of matrices

Index using [row, column].

whale_travel <- matrix(data = c(31.8, 1348, 46.9, 1587), nrow = 2, ncol = 2, byrow = TRUE)

# Take a look
whale_travel

##      [,1] [,2]
## [1,] 31.8 1348
## [2,] 46.9 1587

# Access the value 1348
whale_travel[1,2] # Row 1, column 2

## [1] 1348

# Access the value 46.9
whale_travel[2,1]

## [1] 46.9

If you leave any element blank (keeping the comma), it will return all values from the other element. For example, to get everything in row 2:

whale_travel[2,]

## [1]   46.9 1587.0

Or, to access everything in column 1:

whale_travel[, 1]

## [1] 31.8 46.9

What happens if I only give a matrix one element? That’s the position in the matrix as if populated by column. Check out a few:

whale_travel[3]

## [1] 1348

Lists

urchins <- list("blue", c(1, 2, 3), c("a cat", "a dog"), 5L)

urchins

## [[1]]
## [1] "blue"
## 
## [[2]]
## [1] 1 2 3
## 
## [[3]]
## [1] "a cat" "a dog"
## 
## [[4]]
## [1] 5

Accessing pieces of a list

Important: a single [] returns a list. [[]] returns the item STORED in the list.

urchins[[2]]

## [1] 1 2 3

# Compare that to: 
urchins[2]

## [[1]]
## [1] 1 2 3

Naming list items? Sure thing!

tacos <- list(topping = c("onion", "cilantro", "guacamole"), filling = c("beans", "meat", "veggie"), price = c(6.75, 8.25, 9.50))

# The whole thing
tacos

## $topping
## [1] "onion"     "cilantro"  "guacamole"
## 
## $filling
## [1] "beans"  "meat"   "veggie"
## 
## $price
## [1] 6.75 8.25 9.50

# Just get one piece of it: 
tacos[[2]]

## [1] "beans"  "meat"   "veggie"

#...or, the same thing:
tacos$filling

## [1] "beans"  "meat"   "veggie"

Data frames

A data frame is a list containing vectors of the same length, where each column is a variable stored in a vector. Let’s make one:

fruit <- data.frame(type = c("apple", "banana", "peach"), 
                    mass = c(130, 195, 150))

# Look at it
fruit

##     type mass
## 1  apple  130
## 2 banana  195
## 3  peach  150

# Check the class
class(fruit)

## [1] "data.frame"

Access elements from a data frame

Use [row#, col#], or name the column (then element number).

fruit[1,2]

## [1] 130

fruit[3,1]

## [1] "peach"

fruit[2,1] <- "pineapple"
fruit

##        type mass
## 1     apple  130
## 2 pineapple  195
## 3     peach  150

Part 3: Making & indexing data in Python

import numpy as np
import pandas as pd

Vectors and matrices in Python

teddy = [1,2,8]
teddy_vec = np.array(teddy)

teddy_vec

## array([1, 2, 8])

type(teddy_vec)

## <class 'numpy.ndarray'>

A list is mutable - you can change it directly!

teddy[1] = 1000

# See that element 1 is updated directly! 
teddy

## [1, 1000, 8]

A tuple is immutable - you’ll get yelled at if you try to change it!

khora = (1, 5, 12)
type(khora)

# khora[1] = 16 # Nope.

## <class 'tuple'>

A more involved list (note: you can also use list() to create lists in python).

waffle = [["cat", "dog", "penguin"], 2, "a burrito", [1,2,5]]

waffle

# Access an element from the list waffle:

## [['cat', 'dog', 'penguin'], 2, 'a burrito', [1, 2, 5]]

waffle[0] # Default just returns that piece (not as a list)

## ['cat', 'dog', 'penguin']

We can reassign pieces of a list:

waffle[1] = "AN EXTRAVAGANZA"

waffle

## [['cat', 'dog', 'penguin'], 'AN EXTRAVAGANZA', 'a burrito', [1, 2, 5]]

Make a pandas DataFrame in python

First, a dictionary example:

fox = {'sound': ["screech", "squeal", "bark"], 'age': [2, 6, 10]}

fox['sound']

## ['screech', 'squeal', 'bark']

fox['age']

## [2, 6, 10]

cows = {'name': ["moo", "spots", "happy"], 'location': ["pasture", "prairie", "barn"], 'height': [5.7, 5.4, 4.9]}

cows_df = pd.DataFrame(cows)

# Take a look
cows_df

# Get a column

##     name location  height
## 0    moo  pasture     5.7
## 1  spots  prairie     5.4
## 2  happy     barn     4.9

cows_df['name']

# Get an element using df.at[]

## 0      moo
## 1    spots
## 2    happy
## Name: name, dtype: object

cows_df.at[1, 'name']

## 'spots'

Day 2 Interactive Session Materials

Data types & indexing