I am working with a dataframe which is similar to this:
df1 <- data.frame(p1 = c("John", "John", "John", "John", "John", "John", "Jim", "Jim", "Jim", "Jim", "Jim", "Jim", "Jim","Jim" ),
elapsed_time = c(0, 4, 6, 9, 12, 14, 17, 22, 27, 35, 42, 47, 51, 57),
event_type = c("start of period", "play", "play", "play", "play", "play", "play", "play", "play", "timeout", "play", "play", "play", "play"))
and looks like this:
p1 elapsed_time event_type
1 John 0 start of period
2 John 4 play
3 John 6 play
4 John 9 play
5 John 12 play
6 John 14 play
7 Jim 17 play
8 Jim 22 play
9 Jim 27 play
10 Jim 35 timeout
11 Jim 42 play
12 Jim 47 play
13 Jim 51 play
14 Jim 57 play
What I'd like to do is add a 4th column that calculates elapsed time since 1 of 3 things happened: 1) event_type == "start of period" 2) eventtype == "timeout" 3) p1 was changed (like in row 7 from John to Jim). Any of these three things should reset the 4th column to zero.
My desired output is
p1 elapsed_time event_type elapsed_time_since_last_break
1 John 0 start of period 0
2 John 4 play 4
3 John 6 play 6
4 John 9 play 9
5 John 12 play 12
6 John 14 play 14
7 Jim 17 play 0
8 Jim 22 play 5
9 Jim 27 play 10
10 Jim 35 timeout 0
11 Jim 42 play 7
12 Jim 47 play 12
13 Jim 51 play 16
14 Jim 57 play 22
I'm somewhat new to r and haven't had much success. I'm sure there's probably a simple solution I'm overlooking.
CodePudding user response:
df1 %>%
group_by(p1, elps = cumsum(event_type != 'play'))%>%
mutate(elps = elapsed_time - elapsed_time[1])
# A tibble: 14 × 4
# Groups: p1, elps [13]
p1 elapsed_time event_type elps
<chr> <dbl> <chr> <dbl>
1 John 0 start of period 0
2 John 4 play 4
3 John 6 play 6
4 John 9 play 9
5 John 12 play 12
6 John 14 play 14
7 Jim 17 play 0
8 Jim 22 play 5
9 Jim 27 play 10
10 Jim 35 timeout 0
11 Jim 42 play 7
12 Jim 47 play 12
13 Jim 51 play 16
14 Jim 57 play 22
CodePudding user response:
A data.table
option
setDT(df1)[
,
grp := (rowid(p1) == 1 | (event_type != "play"))
][
,
elps := elapsed_time - elapsed_time[grp][cumsum(grp)]
][
,
grp := NULL
]
gives
> df1
p1 elapsed_time event_type elps
1: John 0 start of period 0
2: John 4 play 4
3: John 6 play 6
4: John 9 play 9
5: John 12 play 12
6: John 14 play 14
7: Jim 17 play 0
8: Jim 22 play 5
9: Jim 27 play 10
10: Jim 35 timeout 0
11: Jim 42 play 7
12: Jim 47 play 12
13: Jim 51 play 16
14: Jim 57 play 22