How to match column data to row data?-CodePudding

I have a sports dataset that reads as follows:

season  team   tm   shk   dgs   brs   cts   cws  avg_pt_marg
2015    sharks shk  0-0   1-3   2-0   4-1   3-2    1.2
2015    dogs   dgs  3-1   0-0   2-1   1-1   2-0    3.4
2015    bears  brs  0-2   1-2   0-0   1-3   2-1    -0.2
2015    cats   cts  1-4   1-1   3-1   0-0   2-2    2.0
2015    cows   cws  2-3   0-2   1-2   2-2   0-0    -2.1
2014    sharks shk  0-0   1-3   2-0   4-1   3-2    0.7
2014    dogs   dgs  3-1   0-0   2-1   1-1   2-0    1.8
2014    bears  brs  0-2   1-2   0-0   1-3   2-1    -1.9
2014    cats   cts  1-4   1-1   3-1   0-0   2-2    2.3
2014    cows   cws  2-3   0-2   1-2   2-2   0-0    -3.0

I would like to add a column for each row (a season for a team) that contains the average point margin for a team's opponents.

This is calculated by summing the number of games played against a team (during that season) multiplied by that team's point margin (during that season) divided by the total number of games played (in that season).

For example, for 2015 sharks, the average point margin for a team's opponents would be ((4 x 3.4) (2 x -0.2) (5 x 2.0) (5 x -2.1)) / 16.

How do I calculate this column and then add it to the dataframe?

Like this:


season  team   tm   shk   dgs   brs   cts   cws  avg_pt_marg opponent_marg
2015    sharks shk  0-0   1-3   2-0   4-1   3-2    1.2
2015    dogs   dgs  3-1   0-0   2-1   1-1   2-0    3.4
2015    bears  brs  0-2   1-2   0-0   1-3   2-1    -0.2
2015    cats   cts  1-4   1-1   3-1   0-0   2-2    2.0
2015    cows   cws  2-3   0-2   1-2   2-2   0-0    -2.1
2014    sharks shk  0-0   1-3   2-0   4-1   3-2    0.7
2014    dogs   dgs  3-1   0-0   2-1   1-1   2-0    1.8
2014    bears  brs  0-2   1-2   0-0   1-3   2-1    -1.9
2014    cats   cts  1-4   1-1   3-1   0-0   2-2    2.3
2014    cows   cws  2-3   0-2   1-2   2-2   0-0    -3.0

CodePudding user response：

Well, it's not pretty, but

do.call(
  rbind,
  by(df,list(df$season),function(x){
    tmp=sapply(
      1:nrow(x),
      function(i){
        unlist(
          lapply(
            strsplit(
              as.character(x[i,grepl("tm[0-9] ",colnames(x))]),
              "-"
            ),
            function(y){
              sum(as.numeric(y))
            }
          )
        )
      }
    )
    cbind(
      x,
      "opponent_marg"=colSums(tmp*x[,"avg_pt_marg"])/colSums(tmp)
    )
  })
)

resulting in

        season  team  tm tm1 tm2 tm3 tm4 tm5 avg_pt_marg opponent_marg
2014.6    2014 team1 tm1 0-0 1-3 2-0 4-1 3-2         0.7    -0.0062500
2014.7    2014 team2 tm2 3-1 0-0 2-1 1-1 2-0         1.8    -0.3909091
2014.8    2014 team3 tm3 0-2 1-2 0-0 1-3 2-1        -1.9     0.5833333
2014.9    2014 team4 tm3 1-4 1-1 3-1 0-0 2-2         2.3    -0.8333333
2014.10   2014 team5 tm5 2-3 0-2 1-2 2-2 0-0        -3.0     0.7571429
2015.1    2015 team1 tm1 0-0 1-3 2-0 4-1 3-2         1.2     0.7937500
2015.2    2015 team2 tm2 3-1 0-0 2-1 1-1 2-0         3.4     0.3636364
2015.3    2015 team3 tm3 0-2 1-2 0-0 1-3 2-1        -0.2     1.1916667
2015.4    2015 team4 tm3 1-4 1-1 3-1 0-0 2-2         2.0     0.2400000
2015.5    2015 team5 tm5 2-3 0-2 1-2 2-2 0-0        -2.1     1.4428571

CodePudding user response：

Another not pretty solution, but it's a fairly complex little task - lots of components to it. I use a data.table here - if you're not familiar with them, they are just a pumped up verion of a data.frame giving some extra functionality

library(data.table)
setDT(dt1)

First up, reshape the data to a longer format

# Reshape the data
dt2 <- dt1[, melt(.SD, id.vars=c("tm", "team", "season", "avg_pt_marg"))]

I also filtered out the cases where team and opponent match. This step also creates a variable for the number of matches/games played

# Filter out cases where team and opponent match
dt2 <- dt2[tm != variable,][,
  # Get number of games played
  `:=`("games_played" = as.numeric(tstrsplit(value, "-")[[1]]) 
                        as.numeric(tstrsplit(value, "-")[[2]]))]

Then the final step to give you the values you want:

# Get the team/season averages
dt3 <- dt2[, sum(avg_pt_marg*games_played)/sum(games_played), keyby=.(season, "tm" = variable)]

You can merge this in with a data.table join

dt1 <- dt1[dt3, on=c("tm", "season")]

Giving:

    season   team  tm shk dgs brs cts cws avg_pt_marg         V1
 1:   2014 sharks shk 0-0 1-3 2-0 4-1 3-2         0.7 -0.0062500
 2:   2014   dogs dgs 3-1 0-0 2-1 1-1 2-0         1.8 -0.3909091
 3:   2014  bears brs 0-2 1-2 0-0 1-3 2-1        -1.9  0.5833333
 4:   2014   cats cts 1-4 1-1 3-1 0-0 2-2         2.3 -0.8333333
 5:   2014   cows cws 2-3 0-2 1-2 2-2 0-0        -3.0  0.7571429
 6:   2015 sharks shk 0-0 1-3 2-0 4-1 3-2         1.2  0.7937500
 7:   2015   dogs dgs 3-1 0-0 2-1 1-1 2-0         3.4  0.3636364
 8:   2015  bears brs 0-2 1-2 0-0 1-3 2-1        -0.2  1.1916667
 9:   2015   cats cts 1-4 1-1 3-1 0-0 2-2         2.0  0.2400000
10:   2015   cows cws 2-3 0-2 1-2 2-2 0-0        -2.1  1.4428571