I'm looking for a solution (dplyr
preferred) that will allow to retain / copy original value of first row, while computing the difference (lag) between consecutive rows.
The data I'm working with is cumulative time of response (time
). First row for each id
need to be retained.
df <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L),
time = c(26204, 46692, 60268, 86240, 91872, 291242, 312311,
333983, 355122, 364841)), class = "data.frame", row.names = c(NA,
-10L))
df
id time
1 1 26204
2 1 46692
3 1 60268
4 1 86240
5 1 91872
6 2 291242
7 2 312311
8 2 333983
9 2 355122
10 2 364841
When using the lag
function from dplyr
I'm able to compute difference scores (rt
) while keeping fist value at NA
or O
df %>% group_by(id) %>% mutate(rt = time - lag(time))
# A tibble: 10 × 3
# Groups: id [2]
id time rt
<int> <dbl> <dbl>
1 1 26204 NA
2 1 46692 20488
3 1 60268 13576
4 1 86240 25972
5 1 91872 5632
6 2 291242 NA
7 2 312311 21069
8 2 333983 21672
9 2 355122 21139
10 2 364841 9719
df %>% group_by(id) %>% mutate(rt = time - lag(time, default = first(time)))
# A tibble: 10 × 3
# Groups: id [2]
id time rt
<int> <dbl> <dbl>
1 1 26204 0
2 1 46692 20488
3 1 60268 13576
4 1 86240 25972
5 1 91872 5632
6 2 291242 0
7 2 312311 21069
8 2 333983 21672
9 2 355122 21139
10 2 364841 9719
What should I do, to replace the NA
s with whatever is the corresponding value in my original column?
Is there a more elegant way that will allow to handle it within a single statement rather then using:
df$rt[is.na(df$rt)] <- df$time[is.na(df$rt)]
CodePudding user response:
We should use default = 0
.
library(dplyr)
df %>% group_by(id) %>% mutate(rt = time - lag(time, default = 0))
# A tibble: 10 × 3
# Groups: id [2]
id time rt
<int> <dbl> <dbl>
1 1 26204 26204
2 1 46692 20488
3 1 60268 13576
4 1 86240 25972
5 1 91872 5632
6 2 291242 291242
7 2 312311 21069
8 2 333983 21672
9 2 355122 21139
10 2 364841 9719