Home > Back-end >  Count event occurrence of row value A and row value B in same column in R or Python
Count event occurrence of row value A and row value B in same column in R or Python

Time:11-03

I am analyzing football matches. I want to know which teams have come from behind to draw a game or even win the game the most.

I have three columns:

  1. Game ID
  2. Teams
  3. Minutes a particular team scored in a match

Here I put an example

GID <- c(1,1,1,1,2,2,2,2,3,3,3,3,3)
Teams <- c("Peru","Peru","Brazil","Brazil", "Chile", "Argentina","Argentina","Argentina","Colombia", "Ecuador", "Ecuador", "Colombia",
"Colombia")
Min_scores <- c(6,10,50,90,36,56,60,70,10,15,20,46,89)

df_example <- data.frame(GID,Teams,Min_scores)
df_example

In this case, there are three matches. Peru against Brazil, Chile against Argentina and Colombia against Ecuador.

In the first match Peru was winning 2-0 and Brazil came storming back and the game was tied at the end. In the second match, Argentina was loosing 1-0 and came back to win 3-1.. In the third example, Colombia started winning 1-0 then Ecuador turn the game around 2-1 and then Colombia turn the game again and finished winning 3-2.

In this case I want to count the times there are these changes of score where the loosing team starts winning or drawing the game. It could be a so called come from behind win or draw.

I've been thinking a lot about this but haven't come with an efficient solution. I think a loop could be a solution but there are a lot of matches and I am a bit worried it could be very slow.

Thank you very much for you help.

CodePudding user response:

You may try

library(dplyr)

df_example %>%
  group_by(GID) %>%
  arrange(GID, Min_scores) %>%
  mutate(key1 = ifelse(Teams == unique(Teams)[1], 1, -1)) %>%
  mutate(key2 = cumsum(key1)) %>%
  summarise(behind_win_or_draw = sum(key2 == 0))

    GID behind_win_or_draw
  <dbl>              <int>
1     1                  1
2     2                  1
3     3                  2

Edit.

df_example %>%
  group_by(GID) %>%
  arrange(GID, Min_scores) %>%
  mutate(key1 = ifelse(Teams == unique(Teams)[1], 1, -1)) %>%
  mutate(key2 = cumsum(key1)) %>%
  filter(key2 == 0) %>%
  group_by(GID, Teams) %>%
  summarise(n = sum(key2 == 0))

    GID Teams         n
  <dbl> <chr>     <int>
1     1 Brazil        1
2     2 Argentina     1
3     3 Colombia      1
4     3 Ecuador       1
  • Related