I am using the Lahman library to get data for baseball results, and in the code below, I calculated the OPS. But now I need help with calculating for every year the correlation between runs per game (R/G) and OPS and plotting these correlations as a function of year. Could you please help?
library (Lahman)
Teams %>% filter(yearID == 2002) %>%
mutate(pa = AB BB, BBPA=BB/pa, Singles = (H-X2B-X3B-HR)/G, Doubles=X2B/G,
Triples=X3B/G, HRAB=HR/AB) %>%
mutate(OPS=BBPA Singles 2*Doubles 3*Triples 4*HRAB) %>% select(OPS,R)
CodePudding user response:
See if this is correct, used group_by()
and summarise()
with cor()
df <- Teams %>%
mutate(pa = AB BB, BBPA=BB/pa, Singles = (H-X2B-X3B-HR)/G, Doubles=X2B/G,
Triples=X3B/G, HRAB=HR/AB) %>%
mutate(OPS=BBPA Singles 2*Doubles 3*Triples 4*HRAB) %>%
select(yearID, OPS, R)
df %>% group_by(yearID) %>%
summarise(cor = cor(OPS, R)) %>%
ggplot()
geom_line(aes(x = yearID, y = cor), color = "blue", size = 1.5)
labs(x = "Year", y = "Correlation between OPS vs R")
theme_classic()
CodePudding user response:
Here is what you could do with just base R.
data(Teams, package='Lahman')
id <- unique(Teams$yearID) ## used for x-axis
Teams |>
within({
pa <- AB BB
BBPA <- BB/pa
Singles <- (H - X2B - X3B - HR)/G
Doubles <- X2B/G
Triples <- X3B/G
HRAB <- HR/AB
OPS <- BBPA Singles 2*Doubles 3*Triples 4*HRAB
}) |>
by(Teams$yearID, \(x) with(x, cor(OPS, R))) |>
plot(type='l', xlab='year', ylab='r', xaxt='n', main='R/G and OPS over time', col=4)
axis(1, at=seq_along(id)[id %% 10 == 0], labels=id[id %% 10 == 0])