How to find time difference between previous and following rows from specific rows-CodePudding

I would like to calculate the difference in time for rows before and after a specific row if it satisfies a condition. I don't want to find the difference in sequence (row 3 - row 2, row 4 - row 3, etc.), but want the difference from the central row. Maybe another way to say it is distance from 0.

If the start column reads "y" I want that row's time to be the origin, but only for 5 seconds. I have about 600,000 rows of a roughly continuous time sequence, so calculating for 5 s on both sides of start should hopefully make it so the calculation doesn't overlap. I'm not even sure how this would look in code. Example data, many columns omitted for ease:

df <- data.frame(
  stringsAsFactors = FALSE,
          initiate = c(0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L),
             start = c("no","no","yes","no","no",
                       "no","no","no","no","yes","no","no","no","no"),
              time = c(2.8225,2.82375,2.825,2.82625,
                       2.827,2.82725,16.8075,16.810,16.82,16.8212,16.825,
                       16.8262,16.8275,16.8300)
)

initiate	start	time
0	no	2.8225
0	no	2.82375
1	yes	2.82500
1	no	2.82625
1	no	2.82700
1	no	2.82725
0	no	16.8075
0	no	16.8100
0	no	16.8200
1	yes	16.8212
1	no	16.8250
0	no	16.8262
1	no	16.8275
1	no	16.8300

What I want the output to be is:

initiate	start	time	diff
0	no	2.8225	-0.00250
0	no	2.82375	-0.00125
1	yes	2.82500	0
1	no	2.82625	0.00125
1	no	2.82700	0.00200
1	no	2.82725	0.00225
0	no	16.8075	-0.0137
0	no	16.8100	-0.0112
0	no	16.8200	-0.0012
1	yes	16.8212	0
1	no	16.8250	0.00380
0	no	16.8262	0.00500
1	no	16.8275	0.00630
1	no	16.8300	0.00880

I've tried using lag, diff with shift, and the following code. I haven't been able to get the calculations to restart from those yes rows. This is the closest I've been able to get, but it only calculates from the 1st yes.

df %>%
  group_by(id, grp = cumsum(lag(start, default = '') == 'yes')) %>% 
  mutate(diff = time - time[match('yes', trial_start)]) %>% 
  {. ->> df}

CodePudding user response：

Using fuzzyjoin might be useful here:

library(dplyr)
library(fuzzyjoin)

df_grp <- df %>% 
  filter(start == "yes") %>% 
  select(time) %>% 
  group_by(grp = row_number()) %>% 
  mutate(begin = time - 5,
         end = time   5)

First we create a data.frame of your initial values with -5 and 5 values:

# A tibble: 2 x 4
   time   grp begin   end
  <dbl> <int> <dbl> <dbl>
1  2.82     1 -2.17  7.82
2 16.8      2 11.8  21.8

Next we use a fuzzy_join to attach it to the original data.frame and calculate the differences:

df %>% 
  fuzzy_left_join(df_grp, 
                  by = c("time" = "begin", "time" = "end"),
                  match_fun = list(`>`, `<`)) %>% 
  group_by(grp) %>% 
  mutate(diff = time.x - time.y) %>% 
  ungroup()

This returns

# A tibble: 14 x 8
   initiate start time.x time.y   grp begin   end     diff
      <int> <chr>  <dbl>  <dbl> <int> <dbl> <dbl>    <dbl>
 1        0 no      2.82   2.82     1 -2.17  7.82 -0.00250
 2        0 no      2.82   2.82     1 -2.17  7.82 -0.00125
 3        1 yes     2.82   2.82     1 -2.17  7.82  0      
 4        1 no      2.83   2.82     1 -2.17  7.82  0.00125
 5        1 no      2.83   2.82     1 -2.17  7.82  0.00200
 6        1 no      2.83   2.82     1 -2.17  7.82  0.00225
 7        0 no     16.8   16.8      2 11.8  21.8  -0.0137 
 8        0 no     16.8   16.8      2 11.8  21.8  -0.0112 
 9        0 no     16.8   16.8      2 11.8  21.8  -0.00120
10        1 yes    16.8   16.8      2 11.8  21.8   0      
11        1 no     16.8   16.8      2 11.8  21.8   0.00380
12        0 no     16.8   16.8      2 11.8  21.8   0.00500
13        1 no     16.8   16.8      2 11.8  21.8   0.00630
14        1 no     16.8   16.8      2 11.8  21.8   0.00880