Home > Software engineering >  Finding scoring runs in R
Finding scoring runs in R

Time:01-25

I am trying to find scoring runs using R. Say I have a tibble as below. I want to calculate the number of times one player scores five in a row (or any number) without the other player scoring. I thought about using cumsum but that doesn't really get me places. Ideally want to stick to tidyverse // using the pipe operator if possible. Thanks!

E.g.

# A tibble: 12 × 2
   player   runs
   <chr> <int>
 1 Bob       2
 2 Aaron     1
 3 Aaron     0
 4 Bob       4
 5 Aaron     1
 6 Aaron     0
 7 Bob       1
 8 Aaron     0
 9 Aaron     2
10 Bob       3
11 Bob       3
12 Aaron     2

CodePudding user response:

Not dplyr, but it works:

redfun <- function(prev, this) {
  dat <- quux[this,]
  if (dat$runs > 0) {
    prev[setdiff(names(prev), dat$player)] <- 0
    prev[dat$player] <- prev[dat$player]   dat$runs
  }
  prev
}
bind_rows(Reduce(redfun, seq_len(nrow(quux)), init = c(Bob=0, Aaron=0), accumulate = TRUE))[-1,] %>%
  bind_cols(quux, .)
#    player runs Bob Aaron
# 1     Bob    2   2     0
# 2   Aaron    1   0     1
# 3   Aaron    0   0     1
# 4     Bob    4   4     0
# 5   Aaron    1   0     1
# 6   Aaron    0   0     1
# 7     Bob    1   1     0
# 8   Aaron    0   1     0
# 9   Aaron    2   0     2
# 10    Bob    3   3     0
# 11    Bob    3   6     0
# 12  Aaron    2   0     2

From this you can see that Bob has one instance with cumulative unopposed runs exceeding 5.

Breakdown:

  • redfun is a finite state machine of sorts: it is called once for each row in the frame (by Reduce) with prev= being the current state of all players (starting with c(Bob=0, Aaron=0)), and if the current row runs is greater than 0, it increments this player and resets the other player(s) to 0
  • Typically Reduce's function is meant to handle "data" (values), but in this case I'm iterating over the row numbers (1-12 in this case, by seq_len(nrow(quux))), and using that inside the inner function to index on the frame. In this way, every time redfun is called, prev (the first arg) is the state (starting at 0/0), and this indicates the row of the frame to look at.
  • Normally, Reduce just returns the last iteration of its cycles. We want all of the steps in between, so we set accumulate=TRUE.
  • Because of how it is set up, Reduce(.., accumulate=TRUE) will return nrow(quux) 1 rows, where the first row is the initial state (of 0/0); we remove this with [-1,] after the call to reduce.
  • The return from Reduce is a list with named values (each row's state), we can convert that easily into a frame with dplyr::bind_rows, and then we can directly combine it (column-wise) with the original frame using dplyr::bind_cols.

Data

quux <- structure(list(player = c("Bob", "Aaron", "Aaron", "Bob", "Aaron", "Aaron", "Bob", "Aaron", "Aaron", "Bob", "Bob", "Aaron"), runs = c(2L, 1L, 0L, 4L, 1L, 0L, 1L, 0L, 2L, 3L, 3L, 2L)), class = "data.frame", row.names = c(NA, -12L))

CodePudding user response:

dplyr approach (credit to Top answer in this post)

library(dplyr)
df %>%
  group_by(player,grp = with(rle(player), rep(seq_along(lengths), lengths))) %>%
  summarise(total_runs=sum(runs))%>%
  arrange(grp)

# A tibble: 8 x 3
# Groups:   player [2]
  player   grp total_runs
  <chr>  <int>      <int>
1 Bob        1          2
2 Aaron      2          1
3 Bob        3          4
4 Aaron      4          1
5 Bob        5          1
6 Aaron      6          2
7 Bob        7          6
8 Aaron      8          2
  • Related