Home > OS >  R Difference between consecutive rows while retaining first row
R Difference between consecutive rows while retaining first row

Time:03-10

I'm looking for a solution (dplyr preferred) that will allow to retain / copy original value of first row, while computing the difference (lag) between consecutive rows.

The data I'm working with is cumulative time of response (time). First row for each id need to be retained.

df <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), 
                     time = c(26204, 46692, 60268, 86240, 91872, 291242, 312311, 
                              333983, 355122, 364841)), class = "data.frame", row.names = c(NA, 
                                                                                            -10L))

df
   id   time
1   1  26204
2   1  46692
3   1  60268
4   1  86240
5   1  91872
6   2 291242
7   2 312311
8   2 333983
9   2 355122
10  2 364841

When using the lag function from dplyr I'm able to compute difference scores (rt) while keeping fist value at NA or O

df %>% group_by(id) %>% mutate(rt = time - lag(time))

# A tibble: 10 × 3
# Groups:   id [2]
      id   time    rt
   <int>  <dbl> <dbl>
 1     1  26204    NA
 2     1  46692 20488
 3     1  60268 13576
 4     1  86240 25972
 5     1  91872  5632
 6     2 291242    NA
 7     2 312311 21069
 8     2 333983 21672
 9     2 355122 21139
10     2 364841  9719
df %>% group_by(id) %>% mutate(rt = time - lag(time, default = first(time)))

# A tibble: 10 × 3
# Groups:   id [2]
      id   time    rt
   <int>  <dbl> <dbl>
 1     1  26204     0
 2     1  46692 20488
 3     1  60268 13576
 4     1  86240 25972
 5     1  91872  5632
 6     2 291242     0
 7     2 312311 21069
 8     2 333983 21672
 9     2 355122 21139
10     2 364841  9719

What should I do, to replace the NAs with whatever is the corresponding value in my original column?

Is there a more elegant way that will allow to handle it within a single statement rather then using:

df$rt[is.na(df$rt)] <- df$time[is.na(df$rt)]

CodePudding user response:

We should use default = 0.

library(dplyr)

df %>% group_by(id) %>% mutate(rt = time - lag(time, default = 0))

# A tibble: 10 × 3
# Groups:   id [2]
      id   time     rt
   <int>  <dbl>  <dbl>
 1     1  26204  26204
 2     1  46692  20488
 3     1  60268  13576
 4     1  86240  25972
 5     1  91872   5632
 6     2 291242 291242
 7     2 312311  21069
 8     2 333983  21672
 9     2 355122  21139
10     2 364841   9719
  • Related