I have a sports dataset that reads as follows:
season team tm shk dgs brs cts cws avg_pt_marg
2015 sharks shk 0-0 1-3 2-0 4-1 3-2 1.2
2015 dogs dgs 3-1 0-0 2-1 1-1 2-0 3.4
2015 bears brs 0-2 1-2 0-0 1-3 2-1 -0.2
2015 cats cts 1-4 1-1 3-1 0-0 2-2 2.0
2015 cows cws 2-3 0-2 1-2 2-2 0-0 -2.1
2014 sharks shk 0-0 1-3 2-0 4-1 3-2 0.7
2014 dogs dgs 3-1 0-0 2-1 1-1 2-0 1.8
2014 bears brs 0-2 1-2 0-0 1-3 2-1 -1.9
2014 cats cts 1-4 1-1 3-1 0-0 2-2 2.3
2014 cows cws 2-3 0-2 1-2 2-2 0-0 -3.0
I would like to add a column for each row (a season for a team) that contains the average point margin for a team's opponents.
This is calculated by summing the number of games played against a team (during that season) multiplied by that team's point margin (during that season) divided by the total number of games played (in that season).
For example, for 2015 sharks,
the average point margin for a team's opponents would be ((4 x 3.4) (2 x -0.2) (5 x 2.0) (5 x -2.1)) / 16.
How do I calculate this column and then add it to the dataframe?
Like this:
season team tm shk dgs brs cts cws avg_pt_marg opponent_marg
2015 sharks shk 0-0 1-3 2-0 4-1 3-2 1.2
2015 dogs dgs 3-1 0-0 2-1 1-1 2-0 3.4
2015 bears brs 0-2 1-2 0-0 1-3 2-1 -0.2
2015 cats cts 1-4 1-1 3-1 0-0 2-2 2.0
2015 cows cws 2-3 0-2 1-2 2-2 0-0 -2.1
2014 sharks shk 0-0 1-3 2-0 4-1 3-2 0.7
2014 dogs dgs 3-1 0-0 2-1 1-1 2-0 1.8
2014 bears brs 0-2 1-2 0-0 1-3 2-1 -1.9
2014 cats cts 1-4 1-1 3-1 0-0 2-2 2.3
2014 cows cws 2-3 0-2 1-2 2-2 0-0 -3.0
CodePudding user response:
Well, it's not pretty, but
do.call(
rbind,
by(df,list(df$season),function(x){
tmp=sapply(
1:nrow(x),
function(i){
unlist(
lapply(
strsplit(
as.character(x[i,grepl("tm[0-9] ",colnames(x))]),
"-"
),
function(y){
sum(as.numeric(y))
}
)
)
}
)
cbind(
x,
"opponent_marg"=colSums(tmp*x[,"avg_pt_marg"])/colSums(tmp)
)
})
)
resulting in
season team tm tm1 tm2 tm3 tm4 tm5 avg_pt_marg opponent_marg
2014.6 2014 team1 tm1 0-0 1-3 2-0 4-1 3-2 0.7 -0.0062500
2014.7 2014 team2 tm2 3-1 0-0 2-1 1-1 2-0 1.8 -0.3909091
2014.8 2014 team3 tm3 0-2 1-2 0-0 1-3 2-1 -1.9 0.5833333
2014.9 2014 team4 tm3 1-4 1-1 3-1 0-0 2-2 2.3 -0.8333333
2014.10 2014 team5 tm5 2-3 0-2 1-2 2-2 0-0 -3.0 0.7571429
2015.1 2015 team1 tm1 0-0 1-3 2-0 4-1 3-2 1.2 0.7937500
2015.2 2015 team2 tm2 3-1 0-0 2-1 1-1 2-0 3.4 0.3636364
2015.3 2015 team3 tm3 0-2 1-2 0-0 1-3 2-1 -0.2 1.1916667
2015.4 2015 team4 tm3 1-4 1-1 3-1 0-0 2-2 2.0 0.2400000
2015.5 2015 team5 tm5 2-3 0-2 1-2 2-2 0-0 -2.1 1.4428571
CodePudding user response:
Another not pretty solution, but it's a fairly complex little task - lots of components to it. I use a data.table here - if you're not familiar with them, they are just a pumped up verion of a data.frame giving some extra functionality
library(data.table)
setDT(dt1)
First up, reshape the data to a longer format
# Reshape the data
dt2 <- dt1[, melt(.SD, id.vars=c("tm", "team", "season", "avg_pt_marg"))]
I also filtered out the cases where team and opponent match. This step also creates a variable for the number of matches/games played
# Filter out cases where team and opponent match
dt2 <- dt2[tm != variable,][,
# Get number of games played
`:=`("games_played" = as.numeric(tstrsplit(value, "-")[[1]])
as.numeric(tstrsplit(value, "-")[[2]]))]
Then the final step to give you the values you want:
# Get the team/season averages
dt3 <- dt2[, sum(avg_pt_marg*games_played)/sum(games_played), keyby=.(season, "tm" = variable)]
You can merge this in with a data.table join
dt1 <- dt1[dt3, on=c("tm", "season")]
Giving:
season team tm shk dgs brs cts cws avg_pt_marg V1
1: 2014 sharks shk 0-0 1-3 2-0 4-1 3-2 0.7 -0.0062500
2: 2014 dogs dgs 3-1 0-0 2-1 1-1 2-0 1.8 -0.3909091
3: 2014 bears brs 0-2 1-2 0-0 1-3 2-1 -1.9 0.5833333
4: 2014 cats cts 1-4 1-1 3-1 0-0 2-2 2.3 -0.8333333
5: 2014 cows cws 2-3 0-2 1-2 2-2 0-0 -3.0 0.7571429
6: 2015 sharks shk 0-0 1-3 2-0 4-1 3-2 1.2 0.7937500
7: 2015 dogs dgs 3-1 0-0 2-1 1-1 2-0 3.4 0.3636364
8: 2015 bears brs 0-2 1-2 0-0 1-3 2-1 -0.2 1.1916667
9: 2015 cats cts 1-4 1-1 3-1 0-0 2-2 2.0 0.2400000
10: 2015 cows cws 2-3 0-2 1-2 2-2 0-0 -2.1 1.4428571