I have a data frame with data for every minute and every weekday during the year and want to calculate the difference based on a specific reference line each day (which is 08:30:00
in this example and Data1
is the column I want to compare the difference for). Usually I would use diff
and lag
but there I can only check the difference to n-previous rows not one specific reference row.
As the entire data has about 1 Mio entries I think using lag
and diff
in a recursive function (where I could use the condition-check for the starting line and then walking forward) would be too time consuming. Another idea I had is doing a second data frame with only the reference line for each day (which only had line 3 in this sample) and then joining with the original data frame as a new column containing the starting value. Then I could easily calc the difference between two columns.
Date Time Data1 Diff
1 2022-01-03 08:28:00 4778.14 0
2 2022-01-03 08:29:00 4784.23 0
3 2022-01-03 08:30:00 4785.15 0
4 2022-01-03 08:31:00 4785.01 -0.14
5 2022-01-03 08:32:00 4787.83 2.68
6 2022-01-03 08:33:00 4788.80 3.65
CodePudding user response:
You can subset Data1
to rows where Time
is "08:30:00"
as follows. This assumes Time
is character.
dat$diff <- dat$Data1 - dat$Data1[[match("08:30:00", dat$Time)]]
dat
Date Time Data1 Diff diff
1 2022-01-03 08:28:00 4778.14 0.00 -7.01
2 2022-01-03 08:29:00 4784.23 0.00 -0.92
3 2022-01-03 08:30:00 4785.15 0.00 0.00
4 2022-01-03 08:31:00 4785.01 -0.14 -0.14
5 2022-01-03 08:32:00 4787.83 2.68 2.68
6 2022-01-03 08:33:00 4788.80 3.65 3.65
For data with multiple dates, you can do the same operation for each day using dplyr::group_by()
:
library(dplyr)
dat %>%
group_by(Date) %>%
mutate(diff = Data1 - Data1[[match("08:30:00", Time)]]) %>%
ungroup()