I would like to cumulate the points of several football clubs for each one for a match day.
I have created a sample dataset to explain the problem:
t <- data.frame(Heim = c("A", "B", "B", "D", "C", "A", "C", "D", "A", "B", "B", "D", "C", "A", "C", "D"),
Auswärts = c("C", "D", "A", "C", "B", "D", "A", "B", "C", "D", "A", "C", "B", "D", "A", "B"),
Ergebnis= c("S", "U", "N", "N", "S", "S", "N", "U", "N", "S", "N", "U", "S", "S", "U", "U"),
Round = c(1,1,2,2,3,3,4,4,1,1,2,2,3,3,4,4),
Saison = c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2))
My idea was for each club (in the origin data set more than 4) a separate column with the score and a cummulated column to it.
So something like this:
t$A_Points <- ifelse(t$Heim =="A" & t$Ergebnis =="S", 3, 0)
t$A_Points <- ifelse(t$Heim =="A" & t$Ergebnis =="U", 1, t$A_Points )
t$A_Points <- ifelse(t$Auswärts =="A" & t$Ergebnis =="U", 1, t$A_Points )
t$A_Points <- ifelse(t$Auswärts =="A" & t$Ergebnis =="N", 3, t$A_Points )
t$A_Points <- ifelse(t$Auswärts !="A" & t$Heim !="A", NA, t$A_Points)
t$A<- ifelse(t$A_Points == "NA", 0, 1)
t<- t %>%
arrange(Saison,Round,A) %>%
group_by(Saison, A) %>%
mutate(cumsum = cumsum(A_Points))
Unfortunately, it is very time and space-consuming even for 4 clubs... Also, I would like to have the sum of the points of the matches without the result of the current matchday.
The optimal result for me would be following:
Heim Auswärts Ergebnis Round Saison Points_Heim Points_Auswärts
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 A C S 1 1 0 0
2 B D U 1 1 0 0
3 B A N 2 1 1 3
4 D C N 2 1 1 0
5 A D S 3 1 6 1
6 C B S 3 1 3 1
7 C A N 4 1 6 9
8 D B U 4 1 1 1
9 A C N 1 2 0 0
10 B D S 1 2 0 0
11 B A N 2 2 3 0
12 D C U 2 2 0 3
13 A D S 3 2 3 1
14 C B S 3 2 4 3
15 C A U 4 2 7 3
16 D B U 4 2 1 3
I would be very happy about an idea for an easier solution.
CodePudding user response:
Probably not the shortest solution. But I would do
t <- t %>%
group_by(Saison) %>%
mutate(Heim_Points_Veränderung = case_when(Ergebnis == "S" ~ 3,
Ergebnis == "U" ~ 1,
Ergebnis == "N" ~ 0),
Auswärts_Points_Veränderung = case_when(Ergebnis == "S" ~ 0,
Ergebnis == "U" ~ 1,
Ergebnis == "N" ~ 3),
Points_Heim = 0,
Points_Auswärts = 0)
for (i in unique(union(t$Heim, t$Auswärts))){
t <- t %>%
mutate(!!sym(paste0(i,"_points")) := if_else(Heim == i, Heim_Points_Veränderung, 0),
!!sym(paste0(i,"_points")) := if_else(Auswärts == i, Auswärts_Points_Veränderung, !!sym(paste0(i,"_points"))),
!!sym(paste0(i,"_cumsum")) := cumsum(lag(!!sym(paste0(i,"_points")), default=0)),
Points_Heim = if_else(Heim == i, !!sym(paste0(i,"_cumsum")), Points_Heim),
Points_Auswärts = if_else(Auswärts == i, !!sym(paste0(i,"_cumsum")), Points_Auswärts))
}
t <- t %>%
select(Heim, Auswärts, Ergebnis, Round, Saison, Points_Heim, Points_Auswärts)
Output
> t
# A tibble: 16 x 7
# Groups: Saison [2]
Heim Auswärts Ergebnis Round Saison Points_Heim Points_Auswärts
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 A C S 1 1 0 0
2 B D U 1 1 0 0
3 B A N 2 1 1 3
4 D C N 2 1 1 0
5 C B S 3 1 3 1
6 A D S 3 1 6 1
7 C A N 4 1 6 9
8 D B U 4 1 1 1
9 A C N 1 2 0 0
10 B D S 1 2 0 0
11 B A N 2 2 3 0
12 D C U 2 2 0 3
13 C B S 3 2 4 3
14 A D S 3 2 3 1
15 C A U 4 2 7 6
16 D B U 4 2 1 3
This solution should adapt to any number of clubs. Briefly, I store the possible change in points in Heim/Auswärts_Points_Veränderung using case_when
(easier than a lot of ifelse
) so I can create a column of points change for each club (running a for
loop on all the clubs). This allows me to do a cumsum
like you using lag
to make sure the cumsum is updated 1 row later (to display the cumulative points before the match and not after), which I enter in the Points_Heim/Auswärts column only when the club is displayed in the Heim/Auswärts columns. The key to my solution is to use !!sym
to feed dynamic variable names to mutate
(note the assignment with :=
) inside the for
loop.