Directions

For this lab you will create a .zip file called lab10.zip which contains the following:

Submit your lab (the .zip file) to the corresponding assignment on Canvas. You have unlimited attempts before the deadline. Your final submission before the deadline will be graded.


Grading

Grading of this lab will largely be based on the ability of the grader to access and run your code. That is, the grader should be able to unzip your lab10.zip file, open lab10.Rproj, then finally open and knit lab10.Rmd without any modification or errors. If they are able to do so, and the resulting lab10.html contains the graphics described below, you will receive at least nine of the ten possible points for the lab.


Walk-Through

The following video describes how to create all of the files described above. It will also walk through each of the exercises and describe and least one valid solution.


Exercise 1 (Setup)

Before creating lab10.Rmd you should first create an RStudio Project named lab10. (The video above will demonstrate this.) This will also create a folder named lab10. Create lab10.Rmd and place it inside this folder.

Add the following code to your .Rmd file which will load the tidyverse. Throughout this lab you may need functions from dplyr and ggplot2.

library(tidyverse)

Additionally, add the following code to your .Rmd file which will load the data needed for this lab:

mlb_pitches_2021 = as_tibble(readRDS(url("https://stat385.org/data/mlb_pitches_2021.rds")))

This data originates from Baseball Savant. In particular this data comes from the Statcast that MLB collects. Several data transformations have been done to the originally accessed data. Ultimately this data contains information on the pitch type, velocity, and spin rate of every MLB pitch thrown in 2021.


Exercise 2 (Pitch Type Frequency)

The following video explains the various “pitch types” used in baseball:

The following table explains the abbreviations used by Statcast:

Pitch Type Pitch Name
CH Changeup
CS Curveball
CU Curveball
EP Eephus
FA Fastball
FC Cutter
FF 4-Seam Fastball
FS Split-Finger
KC Knuckle Curve
KN Knuckleball
SC Screwball
SI Sinker
SL Slider

Create a bar plot that shows the frequency of each pitch type in 2021. Order the bars according to frequency.

Solution

mlb_pitches_2021 %>% 
  filter(pitch_type != "") %>% 
  ggplot(aes(x = fct_infreq(pitch_type), fill = pitch_type)) + 
  geom_bar(show.legend = FALSE) +
  labs(title = "Frequency of MLB Pitch Types",
       subtitle = "2021 Season",
       caption = "Data Source: Baseball Savant") +
  xlab("Pitch Type") +
  ylab("Count") +
  theme_bw()


Exercise 3 (Pitch Type Velocity and Spin)

Can you guess the type of pitch just by watching it?

To get a sense of how this is more easily done by looking at velocity and spin rates, create a plot of spin rate versus velocity for Carlos Rodon. Use color and shapes to indicate the pitch types.

Solution

mlb_pitches_2021 %>%
  filter(pitch_type != "") %>%
  filter(name == "Carlos Rodon") %>%
  na.omit() %>% 
  ggplot(aes(
    x = release_speed,
    y = release_spin_rate,
    color = pitch_type,
    shape = pitch_type
  )) +
  geom_point() +
  labs(title = "Spin Rate versus Velocity",
       subtitle = "Carlos Rodon, 2021",
       caption = "Data Source: Baseball Savant", 
       color = "Pitch Type", 
       shape = "Pitch Type") +
  xlab("Velocity") +
  ylab("Spin Rate") +
  scale_color_brewer(palette = "Set1") + 
  theme_bw()


Exercise 4 (“Sticky Stuff” Ban - League)

MLB was in the news this year as a result of banning so-called “sticky stuff” that pitchers were using to get a better grip on the ball, and to increase the spin rate of their pitches.

The following video gives some background:

Create a graphic that illustrates what happened to the spin rates of four-seam fastballs, the most common pitch, which also happens to generally be the pitch most effected by foreign substances.

The relevant metric to use here is spin rate divided by velocity. (This is because ignoring the foreign substances, the ball will spin more at a higher velocity.) Plot the average of this metric for each day of the 2021 season. Add a smoother. Use color and shapes to indicate which days were before and after the ban.

Solution

mlb_pitches_2021 %>% 
  filter(pitch_type != "") %>%
  filter(pitch_type == "FF") %>%
  filter(game_date < "2021-10-05") %>% 
  na.omit() %>% 
  group_by(game_date) %>% 
  summarise(spin_per_velo = mean(release_spin_rate / release_speed, na.rm = TRUE)) %>%
  mutate(post_ban = game_date >= "2021-06-21") %>% 
  ggplot(aes(x = game_date, y = spin_per_velo)) + 
  geom_point(aes(color = post_ban, shape = post_ban)) + 
  geom_smooth(color = "black") +
  labs(title = "2021 Four-Seam Fastballs",
       subtitle = "Spin per Velocity Through Time",
       caption = "Data Source: Baseball Savant", 
       color = "Post Ban?", 
       shape = "Post Ban?") +
  xlab("Game Date") +
  ylab("Average Spin Rate Divided By Velocity") +
  scale_color_manual(values = c("dodgerblue", "darkorange")) +  
  theme_bw()


Exercise 5 (“Sticky Stuff” Ban - Pitcher)

Chose any pitcher you like, and re-create the previous graphic, but for all of their pitches and pitch types. That is, do not summarize the spin over velocity for each day, but instead plot all the pitches and add a smoother over that. To display each of the pitch types, utilize faceting.

Solution

mlb_pitches_2021 %>% 
  filter(pitch_type != "") %>%
  filter(name == "Gerrit Cole") %>%
  filter(game_date < "2021-10-05") %>% 
  na.omit() %>% 
  mutate(spin_per_velo = release_spin_rate / release_speed) %>% 
  mutate(post_ban = game_date >= "2021-06-21") %>% 
  ggplot(aes(x = game_date, y = spin_per_velo)) + 
  geom_point(aes(color = post_ban, shape = post_ban)) + 
  geom_smooth(color = "black") + 
  facet_wrap(~pitch_type, scales = "free_y") +
  labs(title = "2021 Gerrit Cole",
       subtitle = "Spin per Velocity Through Time",
       caption = "Data Source: Baseball Savant", 
       color = "Post Ban?", 
       shape = "Post Ban?") +
  xlab("Game Date") +
  ylab("Average Spin Rate Divided By Velocity") +
  scale_color_manual(values = c("#132448", "#c4ced3")) +
  theme_bw()