Home > Software engineering >  Subtracting rows based on conditions in other columns
Subtracting rows based on conditions in other columns

Time:11-24

I am working with a dataframe which is similar to this:

df1 <- data.frame(p1 = c("John", "John", "John", "John", "John", "John", "Jim", "Jim", "Jim", "Jim", "Jim", "Jim", "Jim","Jim" ),
           elapsed_time = c(0, 4, 6, 9, 12, 14, 17, 22, 27, 35, 42, 47, 51, 57),
           event_type = c("start of period", "play", "play", "play", "play", "play", "play", "play", "play", "timeout", "play", "play", "play", "play"))

and looks like this:

 p1 elapsed_time      event_type
1  John            0 start of period
2  John            4            play
3  John            6            play
4  John            9            play
5  John           12            play
6  John           14            play
7   Jim           17            play
8   Jim           22            play
9   Jim           27            play
10  Jim           35         timeout
11  Jim           42            play
12  Jim           47            play
13  Jim           51            play
14  Jim           57            play

What I'd like to do is add a 4th column that calculates elapsed time since 1 of 3 things happened: 1) event_type == "start of period" 2) eventtype == "timeout" 3) p1 was changed (like in row 7 from John to Jim). Any of these three things should reset the 4th column to zero.

My desired output is

 p1 elapsed_time      event_type    elapsed_time_since_last_break
1  John            0 start of period                             0
2  John            4            play                             4
3  John            6            play                             6
4  John            9            play                             9
5  John           12            play                            12
6  John           14            play                            14
7   Jim           17            play                             0
8   Jim           22            play                             5
9   Jim           27            play                            10
10  Jim           35         timeout                             0
11  Jim           42            play                             7
12  Jim           47            play                            12
13  Jim           51            play                            16
14  Jim           57            play                            22

I'm somewhat new to r and haven't had much success. I'm sure there's probably a simple solution I'm overlooking.

CodePudding user response:

df1 %>%
  group_by(p1, elps = cumsum(event_type != 'play'))%>%
  mutate(elps = elapsed_time - elapsed_time[1])

# A tibble: 14 × 4
# Groups:   p1, elps [13]
   p1    elapsed_time event_type       elps
   <chr>        <dbl> <chr>           <dbl>
 1 John             0 start of period     0
 2 John             4 play                4
 3 John             6 play                6
 4 John             9 play                9
 5 John            12 play               12
 6 John            14 play               14
 7 Jim             17 play                0
 8 Jim             22 play                5
 9 Jim             27 play               10
10 Jim             35 timeout             0
11 Jim             42 play                7
12 Jim             47 play               12
13 Jim             51 play               16
14 Jim             57 play               22

CodePudding user response:

A data.table option

setDT(df1)[
  ,
  grp := (rowid(p1) == 1 | (event_type != "play"))
][
  ,
  elps := elapsed_time - elapsed_time[grp][cumsum(grp)]
][
  ,
  grp := NULL
]

gives

> df1
      p1 elapsed_time      event_type elps
 1: John            0 start of period    0
 2: John            4            play    4
 3: John            6            play    6
 4: John            9            play    9
 5: John           12            play   12
 6: John           14            play   14
 7:  Jim           17            play    0
 8:  Jim           22            play    5
 9:  Jim           27            play   10
10:  Jim           35         timeout    0
11:  Jim           42            play    7
12:  Jim           47            play   12
13:  Jim           51            play   16
14:  Jim           57            play   22
  • Related