I am analyzing football matches. I want to know which teams have come from behind to draw a game or even win the game the most.
I have three columns:
- Game ID
- Teams
- Minutes a particular team scored in a match
Here I put an example
GID <- c(1,1,1,1,2,2,2,2,3,3,3,3,3)
Teams <- c("Peru","Peru","Brazil","Brazil", "Chile", "Argentina","Argentina","Argentina","Colombia", "Ecuador", "Ecuador", "Colombia",
"Colombia")
Min_scores <- c(6,10,50,90,36,56,60,70,10,15,20,46,89)
df_example <- data.frame(GID,Teams,Min_scores)
df_example
In this case, there are three matches. Peru against Brazil, Chile against Argentina and Colombia against Ecuador.
In the first match Peru was winning 2-0 and Brazil came storming back and the game was tied at the end. In the second match, Argentina was loosing 1-0 and came back to win 3-1.. In the third example, Colombia started winning 1-0 then Ecuador turn the game around 2-1 and then Colombia turn the game again and finished winning 3-2.
In this case I want to count the times there are these changes of score where the loosing team starts winning or drawing the game. It could be a so called come from behind win or draw.
I've been thinking a lot about this but haven't come with an efficient solution. I think a loop could be a solution but there are a lot of matches and I am a bit worried it could be very slow.
Thank you very much for you help.
CodePudding user response:
You may try
library(dplyr)
df_example %>%
group_by(GID) %>%
arrange(GID, Min_scores) %>%
mutate(key1 = ifelse(Teams == unique(Teams)[1], 1, -1)) %>%
mutate(key2 = cumsum(key1)) %>%
summarise(behind_win_or_draw = sum(key2 == 0))
GID behind_win_or_draw
<dbl> <int>
1 1 1
2 2 1
3 3 2
Edit.
df_example %>%
group_by(GID) %>%
arrange(GID, Min_scores) %>%
mutate(key1 = ifelse(Teams == unique(Teams)[1], 1, -1)) %>%
mutate(key2 = cumsum(key1)) %>%
filter(key2 == 0) %>%
group_by(GID, Teams) %>%
summarise(n = sum(key2 == 0))
GID Teams n
<dbl> <chr> <int>
1 1 Brazil 1
2 2 Argentina 1
3 3 Colombia 1
4 3 Ecuador 1