I currently have two data frames that I'm working with. The first is a dataset of MLB baseball games, containing the date of the game and player IDs.
Game_Logs
Date | Batter 1 ID | Batter 2 ID |
---|---|---|
May 1 | Joe | Kevin |
May 1 | John | Samuel |
May 2 | Joe | Kevin |
May 2 | John | Samuel |
The second dataset contains the players stats for the season. This dataset is scraped from the web and can therefore be updated to contain all the stats for each player up until the date of the game listed. For example, below you can see data frames for two different days.
May_1
Batter ID | Hits | Home Runs | Batting Average |
---|---|---|---|
Joe | 15 | 4 | .244 |
John | 18 | 6 | .261 |
Kevin | 29 | 16 | .347 |
Samuel | 7 | 1 | .161 |
May_2
Batter ID | Hits | Home Runs | Batting Average |
---|---|---|---|
Joe | 16 | 4 | .247 |
John | 19 | 6 | .265 |
Kevin | 30 | 17 | .343 |
Samuel | 9 | 2 | .180 |
What I would like to do is loop through the list of games and, where the date and batter ID match, add the stats from the second data frame onto the end of the correct row. The part that is troubling me is that for each row in the first data frame, there are multiple Batter IDs that need to be matched.
My initial thought would be to use nested loops to do this but I am having trouble coming to a solution that lets me loop through each row in order so that I can limit the number of times I am scraping the data.
Here is the start of my initial thought process.
for (i in rows) {
if (i %in% BatterID){
...
}
}
Any thoughts? Thanks.
CodePudding user response:
You need to combine the stats data frames into a single data frame with a Date
column, then it's a simple join.
Something like this:
# put all the stats data frames in a list
stats = list(May_1 = May_1, May_2 = May_2)
# (in your real case you probably want to use `stats = mget(ls(pattern = ...))`
# where `pattern` is a regex pattern to identify these data frames
# combine them and get rid of the underscores
library(dplyr)
library(stringr)
stats_df = bind_rows(stats, .id = "Date") %>%
mutate(Date = str_replace(Date, pattern = "_", replacement = " "))
# and join
Game_Logs %>%
left_join(
stats_df,
by = c("Date", "Batter 1 ID" = "Batter ID")
) %>%
left_join(
stats_df,
by = c("Date", "Batter 2 ID" = "Batter ID",
suffix = c(".1", ".2")
)