eds212-comp-4b
, with a ReadMer-exploring
py-exploring
In your RMarkdown document, attach the following packages in the setup chunk (you’ll need to install the first two):
GGally
skimr
palmerpenguins
Rapid-fire low-level exploration of data:
# Always look at it
# View(penguins)
# Check the column names
names(penguins) # See df.columns in pandas
## [1] "species" "island" "bill_length_mm"
## [4] "bill_depth_mm" "flipper_length_mm" "body_mass_g"
## [7] "sex" "year"
# Check the dimensions
dim(penguins) # See df.shape in pandas
## [1] 344 8
# Get a summary
summary(penguins) # See df.describe() in pandas
## species island bill_length_mm bill_depth_mm
## Adelie :152 Biscoe :168 Min. :32.10 Min. :13.10
## Chinstrap: 68 Dream :124 1st Qu.:39.23 1st Qu.:15.60
## Gentoo :124 Torgersen: 52 Median :44.45 Median :17.30
## Mean :43.92 Mean :17.15
## 3rd Qu.:48.50 3rd Qu.:18.70
## Max. :59.60 Max. :21.50
## NA's :2 NA's :2
## flipper_length_mm body_mass_g sex year
## Min. :172.0 Min. :2700 female:165 Min. :2007
## 1st Qu.:190.0 1st Qu.:3550 male :168 1st Qu.:2007
## Median :197.0 Median :4050 NA's : 11 Median :2008
## Mean :200.9 Mean :4202 Mean :2008
## 3rd Qu.:213.0 3rd Qu.:4750 3rd Qu.:2009
## Max. :231.0 Max. :6300 Max. :2009
## NA's :2 NA's :2
# Print the first 6 lines
head(penguins) # See df.head() in pandas
## # A tibble: 6 × 8
## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex
## <fct> <fct> <dbl> <dbl> <int> <int> <fct>
## 1 Adelie Torge… 39.1 18.7 181 3750 male
## 2 Adelie Torge… 39.5 17.4 186 3800 fema…
## 3 Adelie Torge… 40.3 18 195 3250 fema…
## 4 Adelie Torge… NA NA NA NA <NA>
## 5 Adelie Torge… 36.7 19.3 193 3450 fema…
## 6 Adelie Torge… 39.3 20.6 190 3650 male
## # … with 1 more variable: year <int>
# Print the last 6 lines
tail(penguins) # See df.tail() in pandas
## # A tibble: 6 × 8
## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex
## <fct> <fct> <dbl> <dbl> <int> <int> <fct>
## 1 Chinst… Dream 45.7 17 195 3650 fema…
## 2 Chinst… Dream 55.8 19.8 207 4000 male
## 3 Chinst… Dream 43.5 18.1 202 3400 fema…
## 4 Chinst… Dream 49.6 18.2 193 3775 male
## 5 Chinst… Dream 50.8 19 210 4100 male
## 6 Chinst… Dream 50.2 18.7 198 3775 fema…
## # … with 1 more variable: year <int>
# Make a pairplot
GGally::ggpairs(penguins)
# Make a histogram of penguin flipper lengths
ggplot(data = penguins, aes(x = flipper_length_mm)) +
geom_histogram()
# Import Python packages
import seaborn as sns
import pandas as pd
import numpy as np
# Load the penguins dataset from the seaborn package
penguins = sns.load_dataset('penguins')
penguins.columns # See names(penguins) in R
## Index(['species', 'island', 'bill_length_mm', 'bill_depth_mm',
## 'flipper_length_mm', 'body_mass_g', 'sex'],
## dtype='object')
penguins.shape # See dim(penguins) in R
## (344, 7)
penguins.head() # See head(penguins) in R
## species island bill_length_mm ... flipper_length_mm body_mass_g sex
## 0 Adelie Torgersen 39.1 ... 181.0 3750.0 Male
## 1 Adelie Torgersen 39.5 ... 186.0 3800.0 Female
## 2 Adelie Torgersen 40.3 ... 195.0 3250.0 Female
## 3 Adelie Torgersen NaN ... NaN NaN NaN
## 4 Adelie Torgersen 36.7 ... 193.0 3450.0 Female
##
## [5 rows x 7 columns]
penguins.tail() # See tail(penguins) in R
## species island bill_length_mm ... flipper_length_mm body_mass_g sex
## 339 Gentoo Biscoe NaN ... NaN NaN NaN
## 340 Gentoo Biscoe 46.8 ... 215.0 4850.0 Female
## 341 Gentoo Biscoe 50.4 ... 222.0 5750.0 Male
## 342 Gentoo Biscoe 45.2 ... 212.0 5200.0 Female
## 343 Gentoo Biscoe 49.9 ... 213.0 5400.0 Male
##
## [5 rows x 7 columns]
penguins.describe() # See summary(penguins) in R
# Make a pairs plot with seaborn pairplot
## bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## count 342.000000 342.000000 342.000000 342.000000
## mean 43.921930 17.151170 200.915205 4201.754386
## std 5.459584 1.974793 14.061714 801.954536
## min 32.100000 13.100000 172.000000 2700.000000
## 25% 39.225000 15.600000 190.000000 3550.000000
## 50% 44.450000 17.300000 197.000000 4050.000000
## 75% 48.500000 18.700000 213.000000 4750.000000
## max 59.600000 21.500000 231.000000 6300.000000
sns.pairplot(penguins) # See GGally::ggpairs() in R
# Make a histogram of flipper lengths with sns.histplot:
sns.histplot(data=penguins, x="flipper_length_mm") # See geom_histogram() in R
One more thing: vectors in Python
vec_a = np.array([1,2,3])
vec_b = np.array([10,11,12])
vec_a + vec_b
## array([11, 13, 15])
vec_b - vec_a
## array([9, 9, 9])
vec_a * vec_b
## array([10, 22, 36])
6 * vec_a
## array([ 6, 12, 18])