I am working with a df similar to this:
df <- data.frame(p1 = c("Tom", "Tom", "Tom", "Tom", "Tom", "Tom", "Brad", "Brad", "Brad", "Brad", "Brad", "Brad", "Brad","Brad", "Brad", "Brad" ),
elapsed_time = c(0, 10, 17, 28, 47, 65, 83, 100, 135, 180, 210, 225, 237, 253, 276, 281),
event_type = c("start of period", "play", "play", "timeout", "play", "play", "play", "play", "play", "timeout", "play", "play", "play", "play",
"play", "play"),
scored = c(NA, NA, "Tom", NA, NA, "Tom", NA, NA, NA, NA, NA, "Brad", NA, NA, "Brad", "Brad" ),
timesincelastbreak = c(0, 10, 17, 0, 19, 37, 0, 17, 52, 0, 30, 45, 57, 73, 96, 101))
that outputs this:
p1 elapsed_time event_type scored timesincelastbreak
<chr> <dbl> <chr> <chr> <dbl>
1 Tom 0 start of period NA 0
2 Tom 10 play NA 10
3 Tom 17 play Tom 17
4 Tom 28 timeout NA 0
5 Tom 47 play NA 19
6 Tom 65 play Tom 37
7 Brad 83 play NA 0
8 Brad 100 play NA 17
9 Brad 135 play NA 52
10 Brad 180 timeout NA 0
11 Brad 210 play NA 30
12 Brad 225 play Brad 45
13 Brad 237 play NA 57
14 Brad 253 play NA 73
15 Brad 276 play Brad 96
16 Brad 281 play Brad 101
I'd like to add a column (timesscored
) that adds together the number of times p1 == scored
in the 30 seconds immediately preceding each row, based on timesincelastbreak
.
Also, timesscored
needs to reset every time that timesincelastbreak
is equal to zero.
The desired output is:
p1 elapsed_time event_type scored timesincelastbreak timesscored
<chr> <dbl> <chr> <chr> <dbl> <dbl>
1 Tom 0 start of period NA 0 0
2 Tom 10 play NA 10 0
3 Tom 17 play Tom 17 1
4 Tom 28 timeout NA 0 0
5 Tom 47 play NA 19 0
6 Tom 65 play Tom 37 1
7 Brad 83 play NA 0 0
8 Brad 100 play NA 17 0
9 Brad 135 play NA 52 0
10 Brad 180 timeout NA 0 0
11 Brad 210 play NA 30 0
12 Brad 225 play Brad 45 1
13 Brad 237 play NA 57 1
14 Brad 253 play NA 73 1
15 Brad 276 play Brad 96 1
16 Brad 281 play Brad 101 2
I've tried using cumsum(df$p1 == df$scored)
but I don't know how to specify the interval of 30 for timesincelastbreak
.
CodePudding user response:
Here’s a solution based on adding the past 30 seconds for each row using purrr::map_int()
.
library(dplyr)
library(purrr)
df %>%
group_by(block = cumsum(timesincelastbreak == 0)) %>%
mutate(timescored = map_int(
timesincelastbreak,
\(cur_time) sum(
between(timesincelastbreak, cur_time - 30, cur_time) & scored == p1,
na.rm = TRUE
)
)) %>%
ungroup() %>%
select(!block)
# A tibble: 16 × 6
p1 elapsed_time event_type scored timesincelastbreak timescored
<chr> <dbl> <chr> <chr> <dbl> <int>
1 Tom 0 start of period <NA> 0 0
2 Tom 10 play <NA> 10 0
3 Tom 17 play Tom 17 1
4 Tom 28 timeout <NA> 0 0
5 Tom 47 play <NA> 19 0
6 Tom 65 play Tom 37 1
7 Brad 83 play <NA> 0 0
8 Brad 100 play <NA> 17 0
9 Brad 135 play <NA> 52 0
10 Brad 180 timeout <NA> 0 0
11 Brad 210 play <NA> 30 0
12 Brad 225 play Brad 45 1
13 Brad 237 play <NA> 57 1
14 Brad 253 play <NA> 73 1
15 Brad 276 play Brad 96 1
16 Brad 281 play Brad 101 2
CodePudding user response:
To be honest, I am not sure if I have understood the algorithm correctly.
Using dplyr
and tidyr
we first calculate the timesscored
values:
library(tidyr)
library(dplyr)
df_score <- df %>%
group_by(grp = cumsum(timesincelastbreak == 0)) %>% # reset everything for timesincelastbreak == 0
drop_na(scored) %>%
mutate(
timesscored = ifelse(
row_number() == 1, # first score is independent of timesincelastbreak
1,
coalesce((p1 == scored), FALSE) * # check for p1 == scored
(timesincelastbreak - lag(timesincelastbreak, default = 0) < 30) # check for < 30 seconds
)
) %>%
ungroup() %>%
select(-grp)
This gives us
# A tibble: 5 × 6
p1 elapsed_time event_type scored timesincelastbreak timesscored
<chr> <dbl> <chr> <chr> <dbl> <dbl>
1 Tom 17 play Tom 17 1
2 Tom 65 play Tom 37 1
3 Brad 225 play Brad 45 1
4 Brad 276 play Brad 96 0
5 Brad 281 play Brad 101 1
Now we add those add back to the original data.frame and apply cumsum
:
df %>%
left_join(df_score,
by = c("p1", "elapsed_time", "event_type", "scored", "timesincelastbreak")) %>%
group_by(grp = cumsum(timesincelastbreak == 0)) %>%
mutate(timesscored = cumsum(coalesce(timesscored, 0))) %>%
ungroup() %>%
select(-grp)
This returns
# A tibble: 16 × 6
p1 elapsed_time event_type scored timesincelastbreak timesscored
<chr> <dbl> <chr> <chr> <dbl> <dbl>
1 Tom 0 start of period NA 0 0
2 Tom 10 play NA 10 0
3 Tom 17 play Tom 17 1
4 Tom 28 timeout NA 0 0
5 Tom 47 play NA 19 0
6 Tom 65 play Tom 37 1
7 Brad 83 play NA 0 0
8 Brad 100 play NA 17 0
9 Brad 135 play NA 52 0
10 Brad 180 timeout NA 0 0
11 Brad 210 play NA 30 0
12 Brad 225 play Brad 45 1
13 Brad 237 play NA 57 1
14 Brad 253 play NA 73 1
15 Brad 276 play Brad 96 1
16 Brad 281 play Brad 101 2