I am trying to find scoring runs using R. Say I have a tibble as below. I want to calculate the number of times one player scores five in a row (or any number) without the other player scoring. I thought about using cumsum
but that doesn't really get me places. Ideally want to stick to tidyverse
// using the pipe operator if possible. Thanks!
E.g.
# A tibble: 12 × 2
player runs
<chr> <int>
1 Bob 2
2 Aaron 1
3 Aaron 0
4 Bob 4
5 Aaron 1
6 Aaron 0
7 Bob 1
8 Aaron 0
9 Aaron 2
10 Bob 3
11 Bob 3
12 Aaron 2
CodePudding user response:
Not dplyr, but it works:
redfun <- function(prev, this) {
dat <- quux[this,]
if (dat$runs > 0) {
prev[setdiff(names(prev), dat$player)] <- 0
prev[dat$player] <- prev[dat$player] dat$runs
}
prev
}
bind_rows(Reduce(redfun, seq_len(nrow(quux)), init = c(Bob=0, Aaron=0), accumulate = TRUE))[-1,] %>%
bind_cols(quux, .)
# player runs Bob Aaron
# 1 Bob 2 2 0
# 2 Aaron 1 0 1
# 3 Aaron 0 0 1
# 4 Bob 4 4 0
# 5 Aaron 1 0 1
# 6 Aaron 0 0 1
# 7 Bob 1 1 0
# 8 Aaron 0 1 0
# 9 Aaron 2 0 2
# 10 Bob 3 3 0
# 11 Bob 3 6 0
# 12 Aaron 2 0 2
From this you can see that Bob
has one instance with cumulative unopposed runs exceeding 5.
Breakdown:
redfun
is a finite state machine of sorts: it is called once for each row in the frame (byReduce
) withprev=
being the current state of all players (starting withc(Bob=0, Aaron=0)
), and if the current rowruns
is greater than 0, it increments this player and resets the other player(s) to 0- Typically
Reduce
's function is meant to handle "data" (values), but in this case I'm iterating over the row numbers (1-12 in this case, byseq_len(nrow(quux))
), and using that inside the inner function to index on the frame. In this way, every timeredfun
is called,prev
(the first arg) is the state (starting at0
/0
), andthis
indicates the row of the frame to look at. - Normally,
Reduce
just returns the last iteration of its cycles. We want all of the steps in between, so we setaccumulate=TRUE
. - Because of how it is set up,
Reduce(.., accumulate=TRUE)
will returnnrow(quux) 1
rows, where the first row is the initial state (of 0/0); we remove this with[-1,]
after the call to reduce. - The return from
Reduce
is a list with named values (each row's state), we can convert that easily into a frame withdplyr::bind_rows
, and then we can directly combine it (column-wise) with the original frame usingdplyr::bind_cols
.
Data
quux <- structure(list(player = c("Bob", "Aaron", "Aaron", "Bob", "Aaron", "Aaron", "Bob", "Aaron", "Aaron", "Bob", "Bob", "Aaron"), runs = c(2L, 1L, 0L, 4L, 1L, 0L, 1L, 0L, 2L, 3L, 3L, 2L)), class = "data.frame", row.names = c(NA, -12L))
CodePudding user response:
dplyr approach (credit to Top answer in this post)
library(dplyr)
df %>%
group_by(player,grp = with(rle(player), rep(seq_along(lengths), lengths))) %>%
summarise(total_runs=sum(runs))%>%
arrange(grp)
# A tibble: 8 x 3
# Groups: player [2]
player grp total_runs
<chr> <int> <int>
1 Bob 1 2
2 Aaron 2 1
3 Bob 3 4
4 Aaron 4 1
5 Bob 5 1
6 Aaron 6 2
7 Bob 7 6
8 Aaron 8 2