Home > OS >  how to create a column to get the number of successive occurrences of a category
how to create a column to get the number of successive occurrences of a category

Time:10-23

Hello I have a database of football matches for prediction.

Team<-rep("A",10)
Match<-1:10
Outcome<-c("W","W","W","L","L","W","L","W","L","L")

mymatch<-data.frame(Team,Match,Outcome)

I would like to create a column with the number of successive wins but also successive losses. When the team loses the win sequence starts again at zero. Similarly when it wins the sequence of defeat resumes at zero. I also need a column for the end of a sequence, whether it is a win or a loss.

   Team Match Outcome win_seq win_end loss_seq loss_end
1     A     1       W       1       0        0        0
2     A     2       W       2       0        0        0
3     A     3       W       3       1        0        0
4     A     4       L       0       0        1        0
5     A     5       L       0       0        2        1
6     A     6       W       1       1        0        0
7     A     7       L       0       0        1        1
8     A     8       W       1       1        0        0
9     A     9       L       0       0        1        0
10    A    10       L       0       0        2        1

CodePudding user response:

A dplyr approach

library(dplyr)

mymatch %>% 
  group_by(Team, grp = cumsum(Outcome != lag(Outcome, default="T"))) %>% 
  mutate(win_seq = cumsum(Outcome == "W"), loss_seq = cumsum(Outcome == "L")) %>% 
  ungroup() %>% 
  mutate(
    win_end = (Outcome != lead(Outcome, default= "F") & Outcome == "W") * 1,
    loss_end = (Outcome != lead(Outcome, default = "F") & Outcome == "L") * 1, 
  grp = NULL)
# A tibble: 10 × 7
   Team  Match Outcome win_seq loss_seq win_end loss_end
   <chr> <int> <chr>     <int>    <int>   <dbl>    <dbl>
 1 A         1 W             1        0       0        0
 2 A         2 W             2        0       0        0
 3 A         3 W             3        0       1        0
 4 A         4 L             0        1       0        0
 5 A         5 L             0        2       0        1
 6 A         6 W             1        0       1        0
 7 A         7 L             0        1       0        1
 8 A         8 W             1        0       1        0
 9 A         9 L             0        1       0        0
10 A        10 L             0        2       0        1

CodePudding user response:

This may have been answered before, but here's a solution:

Define a function that creates a sequence that enumerates runs (e.g. W, W, W, L, L would return 1, 1, 1, 2, 2)

get_seq <- function(x) {
    r <- rle(x)
    rep(seq(length(r$lengths)), times = r$lengths)
}

Define runs, group by runs, then create a sequence from 0 to (n-1) within each run:

library(dplyr)
(mymatch
    |> mutate(run = get_seq(Outcome))
    |> group_by(run)
    |> mutate(val = 0:(n()-1))
)

In fact this answer does it better (although I don't think it correctly answers the question posed there):

get_seq2 <- function(x) {
    unlist(sapply(rle(x)$lengths,function(x) 0:(x-1)))
}
(mymatch
    |> mutate(val = get_seq2(Outcome))
)
  •  Tags:  
  • r
  • Related