Subtract the first cell from one column and subtract it from the second cell in a different column-CodePudding

I have a dataframe of stage starting times and stage ending times. I want to subtract the starting time of one stage from the ending time of the previous stage. I.e., I want to subtract the nth cell from one column and from the n-1 cell in a different column. This is easy to do in Excel, but I am not sure how to do it in R.

Here is what the data looks like. I included the dput() of the data as well:

#test data frame
Stage   StartTime             EndTime
102   2021-07-19 17:23:00   2021-07-19 21:53:24
103   2021-07-19 21:54:00   2021-07-19 23:00:14
104   2021-07-19 23:01:00   2021-07-20 00:50:10
105   2021-07-20 00:51:00   2021-07-20 01:50:58
106   2021-07-20 01:51:00   2021-07-20 03:28:22
107   2021-07-20 03:29:00   2021-07-20 04:28:00
108   2021-07-20 05:38:00   2021-07-20 08:19:26


> dput(test[1:7,])
structure(list(Stage = c(102, 103, 104, 105, 106, 107, 108), 
    StartTime = structure(c(1626733380, 1626749640, 1626753660, 
    1626760260, 1626763860, 1626769740, 1626777480), tzone = "", class = c("POSIXct", 
    "POSIXt")), EndTime = structure(c(1626749604, 1626753614, 
    1626760210, 1626763858, 1626769702, 1626773280, 1626787166
    ), tzone = "", class = c("POSIXct", "POSIXt")), row.names = c(NA, -7L), class = c("tbl_df", 
"tbl", "data.frame"))

For example, I know that I can do the following manually to get the time difference between the end of stage 102 and the start of stage 103 by doing:

as.numeric(difftime("2021-07-19 21:54:00", "2021-07-19 21:53:24", units='sec'))
 > 36

I tried to make it more general, but this does NOT work. However, the idea would be to add 1 to the row index every time:

#does not work
# want to subtract row 2, col 2 - the later starting time
# from row 1, col 3 - the earlier end time

as.numeric(difftime(test[1,3], test[2,2], units='sec'))

It doesn't matter where the time difference is saved, whether a new column inside the dataframe or a completely new dataframe. Whatever works. I am really not sure how to do this. Maybe a loop? Any help would be appreciated. Thank you.

CodePudding user response：

You can use the lag function to work with the dates as a vector e.g

library(tidyverse)

data_example <-
  structure(list(
    Stage = c(102, 103, 104, 105, 106, 107, 108),
    StartTime = structure(
      c(
        1626733380,
        1626749640,
        1626753660,
        1626760260,
        1626763860,
        1626769740,
        1626777480
      ),
      tzone = "",
      class = c("POSIXct",
                "POSIXt")
    ),
    EndTime = structure(
      c(
        1626749604,
        1626753614,
        1626760210,
        1626763858,
        1626769702,
        1626773280,
        1626787166
      ),
      tzone = "",
      class = c("POSIXct", "POSIXt")
    )
  ))

tibble_df <- data_example |> as_tibble()


tibble_df |> 
  mutate(time_diff = difftime(StartTime,lag(EndTime)))
#> # A tibble: 7 x 4
#>   Stage StartTime           EndTime             time_diff
#>   <dbl> <dttm>              <dttm>              <drtn>   
#> 1   102 2021-07-19 19:23:00 2021-07-19 23:53:24   NA secs
#> 2   103 2021-07-19 23:54:00 2021-07-20 01:00:14   36 secs
#> 3   104 2021-07-20 01:01:00 2021-07-20 02:50:10   46 secs
#> 4   105 2021-07-20 02:51:00 2021-07-20 03:50:58   50 secs
#> 5   106 2021-07-20 03:51:00 2021-07-20 05:28:22    2 secs
#> 6   107 2021-07-20 05:29:00 2021-07-20 06:28:00   38 secs
#> 7   108 2021-07-20 07:38:00 2021-07-20 10:19:26 4200 secs

^{Created on 2021-10-18 by the reprex package (v2.0.1)}