Home > Blockchain >  In R, how can I add values onto the end of rows where one value matches that of another data frame?
In R, how can I add values onto the end of rows where one value matches that of another data frame?

Time:12-03

I currently have two data frames that I'm working with. The first is a dataset of MLB baseball games, containing the date of the game and player IDs.

Game_Logs

Date Batter 1 ID Batter 2 ID
May 1 Joe Kevin
May 1 John Samuel
May 2 Joe Kevin
May 2 John Samuel

The second dataset contains the players stats for the season. This dataset is scraped from the web and can therefore be updated to contain all the stats for each player up until the date of the game listed. For example, below you can see data frames for two different days.

May_1

Batter ID Hits Home Runs Batting Average
Joe 15 4 .244
John 18 6 .261
Kevin 29 16 .347
Samuel 7 1 .161

May_2

Batter ID Hits Home Runs Batting Average
Joe 16 4 .247
John 19 6 .265
Kevin 30 17 .343
Samuel 9 2 .180

What I would like to do is loop through the list of games and, where the date and batter ID match, add the stats from the second data frame onto the end of the correct row. The part that is troubling me is that for each row in the first data frame, there are multiple Batter IDs that need to be matched.

My initial thought would be to use nested loops to do this but I am having trouble coming to a solution that lets me loop through each row in order so that I can limit the number of times I am scraping the data.

Here is the start of my initial thought process.

for (i in rows) {
      if (i %in% BatterID){
        ...
  }
}

Any thoughts? Thanks.

CodePudding user response:

You need to combine the stats data frames into a single data frame with a Date column, then it's a simple join.

Something like this:

# put all the stats data frames in a list
stats = list(May_1 = May_1, May_2 = May_2)
# (in your real case you probably want to use `stats = mget(ls(pattern = ...))`
# where `pattern` is a regex pattern to identify these data frames

# combine them and get rid of the underscores
library(dplyr)
library(stringr)
stats_df = bind_rows(stats, .id = "Date") %>%
  mutate(Date = str_replace(Date, pattern = "_", replacement = " "))

# and join
Game_Logs %>%
  left_join(
    stats_df,
    by = c("Date", "Batter 1 ID" = "Batter ID")
  ) %>%
  left_join(
    stats_df,
    by = c("Date", "Batter 2 ID" = "Batter ID",
    suffix = c(".1", ".2")
  )
  • Related