For this lab you will create a .zip
file called lab10.zip
which contains the following:
lab10.Rmd
- An RMarkdown file.lab10.html
- The results of knitting the RMarkdown file.lab10.Rproj
- An RStudio project file.Submit your lab (the .zip
file) to the corresponding assignment on Canvas. You have unlimited attempts before the deadline. Your final submission before the deadline will be graded.
Grading of this lab will largely be based on the ability of the grader to access and run your code. That is, the grader should be able to unzip your lab10.zip
file, open lab10.Rproj
, then finally open and knit lab10.Rmd
without any modification or errors. If they are able to do so, and the resulting lab10.html
contains the graphics described below, you will receive at least nine of the ten possible points for the lab.
The following video describes how to create all of the files described above. It will also walk through each of the exercises and describe and least one valid solution.
Before creating lab10.Rmd
you should first create an RStudio Project named lab10
. (The video above will demonstrate this.) This will also create a folder named lab10
. Create lab10.Rmd
and place it inside this folder.
Add the following code to your .Rmd
file which will load the tidyverse
. Throughout this lab you may need functions from dplyr
and ggplot2
.
library(tidyverse)
Additionally, add the following code to your .Rmd
file which will load the data needed for this lab:
mlb_pitches_2021 = as_tibble(readRDS(url("https://stat385.org/data/mlb_pitches_2021.rds")))
This data originates from Baseball Savant. In particular this data comes from the Statcast that MLB collects. Several data transformations have been done to the originally accessed data. Ultimately this data contains information on the pitch type, velocity, and spin rate of every MLB pitch thrown in 2021.
The following video explains the various “pitch types” used in baseball:
The following table explains the abbreviations used by Statcast:
Pitch Type | Pitch Name |
---|---|
CH | Changeup |
CS | Curveball |
CU | Curveball |
EP | Eephus |
FA | Fastball |
FC | Cutter |
FF | 4-Seam Fastball |
FS | Split-Finger |
KC | Knuckle Curve |
KN | Knuckleball |
SC | Screwball |
SI | Sinker |
SL | Slider |
Create a bar plot that shows the frequency of each pitch type in 2021. Order the bars according to frequency.
mlb_pitches_2021 %>%
filter(pitch_type != "") %>%
ggplot(aes(x = fct_infreq(pitch_type), fill = pitch_type)) +
geom_bar(show.legend = FALSE) +
labs(title = "Frequency of MLB Pitch Types",
subtitle = "2021 Season",
caption = "Data Source: Baseball Savant") +
xlab("Pitch Type") +
ylab("Count") +
theme_bw()
Can you guess the type of pitch just by watching it?
To get a sense of how this is more easily done by looking at velocity and spin rates, create a plot of spin rate versus velocity for Carlos Rodon. Use color and shapes to indicate the pitch types.
mlb_pitches_2021 %>%
filter(pitch_type != "") %>%
filter(name == "Carlos Rodon") %>%
na.omit() %>%
ggplot(aes(
x = release_speed,
y = release_spin_rate,
color = pitch_type,
shape = pitch_type
)) +
geom_point() +
labs(title = "Spin Rate versus Velocity",
subtitle = "Carlos Rodon, 2021",
caption = "Data Source: Baseball Savant",
color = "Pitch Type",
shape = "Pitch Type") +
xlab("Velocity") +
ylab("Spin Rate") +
scale_color_brewer(palette = "Set1") +
theme_bw()
MLB was in the news this year as a result of banning so-called “sticky stuff” that pitchers were using to get a better grip on the ball, and to increase the spin rate of their pitches.
The following video gives some background:
Create a graphic that illustrates what happened to the spin rates of four-seam fastballs, the most common pitch, which also happens to generally be the pitch most effected by foreign substances.
The relevant metric to use here is spin rate divided by velocity. (This is because ignoring the foreign substances, the ball will spin more at a higher velocity.) Plot the average of this metric for each day of the 2021 season. Add a smoother. Use color and shapes to indicate which days were before and after the ban.
mlb_pitches_2021 %>%
filter(pitch_type != "") %>%
filter(pitch_type == "FF") %>%
filter(game_date < "2021-10-05") %>%
na.omit() %>%
group_by(game_date) %>%
summarise(spin_per_velo = mean(release_spin_rate / release_speed, na.rm = TRUE)) %>%
mutate(post_ban = game_date >= "2021-06-21") %>%
ggplot(aes(x = game_date, y = spin_per_velo)) +
geom_point(aes(color = post_ban, shape = post_ban)) +
geom_smooth(color = "black") +
labs(title = "2021 Four-Seam Fastballs",
subtitle = "Spin per Velocity Through Time",
caption = "Data Source: Baseball Savant",
color = "Post Ban?",
shape = "Post Ban?") +
xlab("Game Date") +
ylab("Average Spin Rate Divided By Velocity") +
scale_color_manual(values = c("dodgerblue", "darkorange")) +
theme_bw()
Chose any pitcher you like, and re-create the previous graphic, but for all of their pitches and pitch types. That is, do not summarize the spin over velocity for each day, but instead plot all the pitches and add a smoother over that. To display each of the pitch types, utilize faceting.
mlb_pitches_2021 %>%
filter(pitch_type != "") %>%
filter(name == "Gerrit Cole") %>%
filter(game_date < "2021-10-05") %>%
na.omit() %>%
mutate(spin_per_velo = release_spin_rate / release_speed) %>%
mutate(post_ban = game_date >= "2021-06-21") %>%
ggplot(aes(x = game_date, y = spin_per_velo)) +
geom_point(aes(color = post_ban, shape = post_ban)) +
geom_smooth(color = "black") +
facet_wrap(~pitch_type, scales = "free_y") +
labs(title = "2021 Gerrit Cole",
subtitle = "Spin per Velocity Through Time",
caption = "Data Source: Baseball Savant",
color = "Post Ban?",
shape = "Post Ban?") +
xlab("Game Date") +
ylab("Average Spin Rate Divided By Velocity") +
scale_color_manual(values = c("#132448", "#c4ced3")) +
theme_bw()