I have a dataframe (df) with 3 columns - a stage number, time data, and pressure data. Here is a portion of it:
structure(list(
Stage = c(1, 1, 1, 1, 1, 2, 2, 2),
Pressure = c(3.24, 12.218, 9.634, 9.027, 9.027, 0, 14.28, 1.737),
DateTime = structure(c(1624720853, 1624720854, 1624720855, 1624720856, 1624720857, 1624905025, 1624905026, 1624905027),
tzone = "", class = c("POSIXct", "POSIXt")),
class = "data.frame",
row.names = c(NA, -8L))
I want to calculate the slope/derivative (change in pressure over change in time) for pressure point for each stage. I have figured out how to calculate the slope, but there are sometimes large gaps in stages and I don't need slope for changes in pressure between stages.
I have code that I believe would work, but because it is looking at the difference in rows, the output will always going to be one less row than than the total number rows within a stage.
df<- df%>%
group_by(Stage) %>%
mutate(dp.dt = diff(Pressure)/as.numeric(diff(DateTime)) )
This is the error, and like I mentioned, I believe it is happening because the code is looking at the difference in rows, which should result in one less row than the true number of rows in a stage:
Error: Problem with `mutate()` column `dp.dt`.
i `dp.dt = diff(Pressure)/as.numeric(diff(DateTime))`.
i `dp.dt` must be size 5 or 1, not 4.
The error occurred in group 1: JobStage = 1.
In the end, I am looking for something like the table below. Is there a way to induce an NA, add a row, or fill the missing row with something so that I get my desired table?
Please let me know if I need to clarify anything. Any help would be appreciated. Thank you.
Stage | Pressure | DateTime | dp.dt |
---|---|---|---|
1 | 3.24 | 2021-06-26 10:20:53 | 8.978 |
1 | 12.128 | 2021-06-26 10:20:54 | -2.584 |
1 | 9.634 | 2021-06-26 10:20:55 | -0.607 |
1 | 9.027 | 2021-06-26 10:20:56 | 0.000 |
1 | 9.027 | 2021-06-26 10:20:57 | NA |
2 | 0 | 2021-06-28 13:30:25 | 14.280 |
2 | 14.28 | 2021-06-28 13:30:26 | -12.543 |
2 | 1.737 | 2021-06-28 13:30:27 | NA |
CodePudding user response:
The diff
returns output with length one less than the original data length. Just append NA
to the diff
and it should work
library(dplyr)
df%>%
group_by(Stage) %>%
mutate(dp.dt = c(diff(Pressure), NA)/as.numeric(c(diff(DateTime), NA)) )
CodePudding user response:
You may use lead
/lag
to get next and previous value respectively.
library(dplyr)
library(lubridate)
df %>%
mutate(DateTime = ymd_hms(DateTime)) %>%
group_by(Stage) %>%
mutate(dp.dt = (lead(Pressure) - Pressure)/as.numeric((lead(DateTime) - DateTime))) %>%
ungroup
# Stage Pressure DateTime dp.dt
# <dbl> <dbl> <dttm> <dbl>
#1 1 3.24 2021-06-26 23:20:53 8.98
#2 1 12.2 2021-06-26 23:20:54 -2.58
#3 1 9.63 2021-06-26 23:20:55 -0.607
#4 1 9.03 2021-06-26 23:20:56 0
#5 1 9.03 2021-06-26 23:20:57 NA
#6 2 0 2021-06-29 02:30:25 14.3
#7 2 14.3 2021-06-29 02:30:26 -12.5
#8 2 1.74 2021-06-29 02:30:27 NA