Home > Mobile >  How to find time difference between previous and following rows from specific rows
How to find time difference between previous and following rows from specific rows

Time:04-14

I would like to calculate the difference in time for rows before and after a specific row if it satisfies a condition. I don't want to find the difference in sequence (row 3 - row 2, row 4 - row 3, etc.), but want the difference from the central row. Maybe another way to say it is distance from 0.

If the start column reads "y" I want that row's time to be the origin, but only for 5 seconds. I have about 600,000 rows of a roughly continuous time sequence, so calculating for 5 s on both sides of start should hopefully make it so the calculation doesn't overlap. I'm not even sure how this would look in code. Example data, many columns omitted for ease:

df <- data.frame(
  stringsAsFactors = FALSE,
          initiate = c(0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L),
             start = c("no","no","yes","no","no",
                       "no","no","no","no","yes","no","no","no","no"),
              time = c(2.8225,2.82375,2.825,2.82625,
                       2.827,2.82725,16.8075,16.810,16.82,16.8212,16.825,
                       16.8262,16.8275,16.8300)
)
initiate start time
0 no 2.8225
0 no 2.82375
1 yes 2.82500
1 no 2.82625
1 no 2.82700
1 no 2.82725
0 no 16.8075
0 no 16.8100
0 no 16.8200
1 yes 16.8212
1 no 16.8250
0 no 16.8262
1 no 16.8275
1 no 16.8300

What I want the output to be is:

initiate start time diff
0 no 2.8225 -0.00250
0 no 2.82375 -0.00125
1 yes 2.82500 0
1 no 2.82625 0.00125
1 no 2.82700 0.00200
1 no 2.82725 0.00225
0 no 16.8075 -0.0137
0 no 16.8100 -0.0112
0 no 16.8200 -0.0012
1 yes 16.8212 0
1 no 16.8250 0.00380
0 no 16.8262 0.00500
1 no 16.8275 0.00630
1 no 16.8300 0.00880

I've tried using lag, diff with shift, and the following code. I haven't been able to get the calculations to restart from those yes rows. This is the closest I've been able to get, but it only calculates from the 1st yes.

df %>%
  group_by(id, grp = cumsum(lag(start, default = '') == 'yes')) %>% 
  mutate(diff = time - time[match('yes', trial_start)]) %>% 
  {. ->> df}

CodePudding user response:

Using fuzzyjoin might be useful here:

library(dplyr)
library(fuzzyjoin)

df_grp <- df %>% 
  filter(start == "yes") %>% 
  select(time) %>% 
  group_by(grp = row_number()) %>% 
  mutate(begin = time - 5,
         end = time   5)

First we create a data.frame of your initial values with -5 and 5 values:

# A tibble: 2 x 4
   time   grp begin   end
  <dbl> <int> <dbl> <dbl>
1  2.82     1 -2.17  7.82
2 16.8      2 11.8  21.8 

Next we use a fuzzy_join to attach it to the original data.frame and calculate the differences:

df %>% 
  fuzzy_left_join(df_grp, 
                  by = c("time" = "begin", "time" = "end"),
                  match_fun = list(`>`, `<`)) %>% 
  group_by(grp) %>% 
  mutate(diff = time.x - time.y) %>% 
  ungroup()

This returns

# A tibble: 14 x 8
   initiate start time.x time.y   grp begin   end     diff
      <int> <chr>  <dbl>  <dbl> <int> <dbl> <dbl>    <dbl>
 1        0 no      2.82   2.82     1 -2.17  7.82 -0.00250
 2        0 no      2.82   2.82     1 -2.17  7.82 -0.00125
 3        1 yes     2.82   2.82     1 -2.17  7.82  0      
 4        1 no      2.83   2.82     1 -2.17  7.82  0.00125
 5        1 no      2.83   2.82     1 -2.17  7.82  0.00200
 6        1 no      2.83   2.82     1 -2.17  7.82  0.00225
 7        0 no     16.8   16.8      2 11.8  21.8  -0.0137 
 8        0 no     16.8   16.8      2 11.8  21.8  -0.0112 
 9        0 no     16.8   16.8      2 11.8  21.8  -0.00120
10        1 yes    16.8   16.8      2 11.8  21.8   0      
11        1 no     16.8   16.8      2 11.8  21.8   0.00380
12        0 no     16.8   16.8      2 11.8  21.8   0.00500
13        1 no     16.8   16.8      2 11.8  21.8   0.00630
14        1 no     16.8   16.8      2 11.8  21.8   0.00880
  • Related