Home > Back-end >  Correlate two variables per year and plot
Correlate two variables per year and plot

Time:05-16

I am using the Lahman library to get data for baseball results, and in the code below, I calculated the OPS. But now I need help with calculating for every year the correlation between runs per game (R/G) and OPS and plotting these correlations as a function of year. Could you please help?

library (Lahman)
Teams %>% filter(yearID == 2002) %>%
  mutate(pa = AB   BB, BBPA=BB/pa, Singles = (H-X2B-X3B-HR)/G, Doubles=X2B/G, 
         Triples=X3B/G, HRAB=HR/AB) %>%
  mutate(OPS=BBPA Singles 2*Doubles 3*Triples 4*HRAB) %>% select(OPS,R)

CodePudding user response:

See if this is correct, used group_by() and summarise() with cor()

df <- Teams %>% 
            mutate(pa = AB   BB, BBPA=BB/pa, Singles = (H-X2B-X3B-HR)/G, Doubles=X2B/G, 
                   Triples=X3B/G, HRAB=HR/AB) %>%
            mutate(OPS=BBPA Singles 2*Doubles 3*Triples 4*HRAB) %>%
            select(yearID, OPS, R) 
    
    
    
    df %>%  group_by(yearID) %>%
    summarise(cor = cor(OPS, R)) %>%
    ggplot() 
    geom_line(aes(x = yearID, y = cor), color = "blue", size = 1.5) 
    labs(x = "Year", y = "Correlation between OPS vs R") 
    theme_classic()

Graph output

CodePudding user response:

Here is what you could do with just base R.

data(Teams, package='Lahman')

id <- unique(Teams$yearID)  ## used for x-axis

Teams |>
  within({
    pa <- AB   BB
    BBPA <- BB/pa
    Singles <- (H - X2B - X3B - HR)/G
    Doubles <- X2B/G
    Triples <- X3B/G
    HRAB <- HR/AB
    OPS <- BBPA   Singles   2*Doubles   3*Triples   4*HRAB
  }) |>
  by(Teams$yearID, \(x) with(x, cor(OPS, R))) |>
  plot(type='l', xlab='year', ylab='r', xaxt='n', main='R/G and OPS over time', col=4)  
  axis(1, at=seq_along(id)[id %% 10 == 0], labels=id[id %% 10 == 0])

enter image description here

  • Related