I would like to calculate the difference in time for rows before and after a specific row if it satisfies a condition. I don't want to find the difference in sequence (row 3 - row 2, row 4 - row 3, etc.), but want the difference from the central row. Maybe another way to say it is distance from 0.
If the start column reads "y" I want that row's time to be the origin, but only for 5 seconds. I have about 600,000 rows of a roughly continuous time sequence, so calculating for 5 s on both sides of start should hopefully make it so the calculation doesn't overlap. I'm not even sure how this would look in code. Example data, many columns omitted for ease:
df <- data.frame(
stringsAsFactors = FALSE,
initiate = c(0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L),
start = c("no","no","yes","no","no",
"no","no","no","no","yes","no","no","no","no"),
time = c(2.8225,2.82375,2.825,2.82625,
2.827,2.82725,16.8075,16.810,16.82,16.8212,16.825,
16.8262,16.8275,16.8300)
)
initiate | start | time |
---|---|---|
0 | no | 2.8225 |
0 | no | 2.82375 |
1 | yes | 2.82500 |
1 | no | 2.82625 |
1 | no | 2.82700 |
1 | no | 2.82725 |
0 | no | 16.8075 |
0 | no | 16.8100 |
0 | no | 16.8200 |
1 | yes | 16.8212 |
1 | no | 16.8250 |
0 | no | 16.8262 |
1 | no | 16.8275 |
1 | no | 16.8300 |
What I want the output to be is:
initiate | start | time | diff |
---|---|---|---|
0 | no | 2.8225 | -0.00250 |
0 | no | 2.82375 | -0.00125 |
1 | yes | 2.82500 | 0 |
1 | no | 2.82625 | 0.00125 |
1 | no | 2.82700 | 0.00200 |
1 | no | 2.82725 | 0.00225 |
0 | no | 16.8075 | -0.0137 |
0 | no | 16.8100 | -0.0112 |
0 | no | 16.8200 | -0.0012 |
1 | yes | 16.8212 | 0 |
1 | no | 16.8250 | 0.00380 |
0 | no | 16.8262 | 0.00500 |
1 | no | 16.8275 | 0.00630 |
1 | no | 16.8300 | 0.00880 |
I've tried using lag, diff with shift, and the following code. I haven't been able to get the calculations to restart from those yes rows. This is the closest I've been able to get, but it only calculates from the 1st yes.
df %>%
group_by(id, grp = cumsum(lag(start, default = '') == 'yes')) %>%
mutate(diff = time - time[match('yes', trial_start)]) %>%
{. ->> df}
CodePudding user response:
Using fuzzyjoin
might be useful here:
library(dplyr)
library(fuzzyjoin)
df_grp <- df %>%
filter(start == "yes") %>%
select(time) %>%
group_by(grp = row_number()) %>%
mutate(begin = time - 5,
end = time 5)
First we create a data.frame of your initial values with -5
and 5
values:
# A tibble: 2 x 4
time grp begin end
<dbl> <int> <dbl> <dbl>
1 2.82 1 -2.17 7.82
2 16.8 2 11.8 21.8
Next we use a fuzzy_join
to attach it to the original data.frame and calculate the differences:
df %>%
fuzzy_left_join(df_grp,
by = c("time" = "begin", "time" = "end"),
match_fun = list(`>`, `<`)) %>%
group_by(grp) %>%
mutate(diff = time.x - time.y) %>%
ungroup()
This returns
# A tibble: 14 x 8
initiate start time.x time.y grp begin end diff
<int> <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0 no 2.82 2.82 1 -2.17 7.82 -0.00250
2 0 no 2.82 2.82 1 -2.17 7.82 -0.00125
3 1 yes 2.82 2.82 1 -2.17 7.82 0
4 1 no 2.83 2.82 1 -2.17 7.82 0.00125
5 1 no 2.83 2.82 1 -2.17 7.82 0.00200
6 1 no 2.83 2.82 1 -2.17 7.82 0.00225
7 0 no 16.8 16.8 2 11.8 21.8 -0.0137
8 0 no 16.8 16.8 2 11.8 21.8 -0.0112
9 0 no 16.8 16.8 2 11.8 21.8 -0.00120
10 1 yes 16.8 16.8 2 11.8 21.8 0
11 1 no 16.8 16.8 2 11.8 21.8 0.00380
12 0 no 16.8 16.8 2 11.8 21.8 0.00500
13 1 no 16.8 16.8 2 11.8 21.8 0.00630
14 1 no 16.8 16.8 2 11.8 21.8 0.00880