Home > Back-end >  Calculating the difference of a column compared to a specific reference row
Calculating the difference of a column compared to a specific reference row

Time:01-08

I have a data frame with data for every minute and every weekday during the year and want to calculate the difference based on a specific reference line each day (which is 08:30:00 in this example and Data1 is the column I want to compare the difference for). Usually I would use diff and lag but there I can only check the difference to n-previous rows not one specific reference row.

As the entire data has about 1 Mio entries I think using lag and diff in a recursive function (where I could use the condition-check for the starting line and then walking forward) would be too time consuming. Another idea I had is doing a second data frame with only the reference line for each day (which only had line 3 in this sample) and then joining with the original data frame as a new column containing the starting value. Then I could easily calc the difference between two columns.

       Date     Time    Data1  Diff
1 2022-01-03 08:28:00 4778.14     0
2 2022-01-03 08:29:00 4784.23     0
3 2022-01-03 08:30:00 4785.15     0
4 2022-01-03 08:31:00 4785.01 -0.14
5 2022-01-03 08:32:00 4787.83  2.68
6 2022-01-03 08:33:00 4788.80  3.65

CodePudding user response:

You can subset Data1 to rows where Time is "08:30:00" as follows. This assumes Time is character.

dat$diff <- dat$Data1 - dat$Data1[[match("08:30:00", dat$Time)]]

dat
       Date     Time   Data1  Diff  diff
1 2022-01-03 08:28:00 4778.14  0.00 -7.01
2 2022-01-03 08:29:00 4784.23  0.00 -0.92
3 2022-01-03 08:30:00 4785.15  0.00  0.00
4 2022-01-03 08:31:00 4785.01 -0.14 -0.14
5 2022-01-03 08:32:00 4787.83  2.68  2.68
6 2022-01-03 08:33:00 4788.80  3.65  3.65

For data with multiple dates, you can do the same operation for each day using dplyr::group_by():

library(dplyr)

dat %>%
  group_by(Date) %>%
  mutate(diff = Data1 - Data1[[match("08:30:00", Time)]]) %>%
  ungroup()
  • Related