Home > Software design >  Grouping to add a new column produces one less row than dataset. How do I make the rows match?
Grouping to add a new column produces one less row than dataset. How do I make the rows match?

Time:10-27

I have a dataframe (df) with 3 columns - a stage number, time data, and pressure data. Here is a portion of it:

structure(list(
Stage = c(1, 1, 1, 1, 1, 2, 2, 2), 
Pressure = c(3.24, 12.218, 9.634, 9.027, 9.027, 0, 14.28, 1.737), 
DateTime = structure(c(1624720853, 1624720854, 1624720855, 1624720856, 1624720857, 1624905025, 1624905026, 1624905027), 
tzone = "", class = c("POSIXct", "POSIXt")),
class = "data.frame", 
row.names = c(NA, -8L))

I want to calculate the slope/derivative (change in pressure over change in time) for pressure point for each stage. I have figured out how to calculate the slope, but there are sometimes large gaps in stages and I don't need slope for changes in pressure between stages.

I have code that I believe would work, but because it is looking at the difference in rows, the output will always going to be one less row than than the total number rows within a stage.

df<- df%>% 
  group_by(Stage) %>%
  mutate(dp.dt = diff(Pressure)/as.numeric(diff(DateTime)) )

This is the error, and like I mentioned, I believe it is happening because the code is looking at the difference in rows, which should result in one less row than the true number of rows in a stage:

Error: Problem with `mutate()` column `dp.dt`.
i `dp.dt = diff(Pressure)/as.numeric(diff(DateTime))`.
i `dp.dt` must be size 5 or 1, not 4.
The error occurred in group 1: JobStage = 1.

In the end, I am looking for something like the table below. Is there a way to induce an NA, add a row, or fill the missing row with something so that I get my desired table?

Please let me know if I need to clarify anything. Any help would be appreciated. Thank you.

Stage Pressure DateTime dp.dt
1 3.24 2021-06-26 10:20:53 8.978
1 12.128 2021-06-26 10:20:54 -2.584
1 9.634 2021-06-26 10:20:55 -0.607
1 9.027 2021-06-26 10:20:56 0.000
1 9.027 2021-06-26 10:20:57 NA
2 0 2021-06-28 13:30:25 14.280
2 14.28 2021-06-28 13:30:26 -12.543
2 1.737 2021-06-28 13:30:27 NA

CodePudding user response:

The diff returns output with length one less than the original data length. Just append NA to the diff and it should work

library(dplyr)
df%>% 
  group_by(Stage) %>%
  mutate(dp.dt = c(diff(Pressure), NA)/as.numeric(c(diff(DateTime), NA)) )

CodePudding user response:

You may use lead/lag to get next and previous value respectively.

library(dplyr)
library(lubridate)

df %>%
  mutate(DateTime = ymd_hms(DateTime)) %>% 
  group_by(Stage) %>%
  mutate(dp.dt = (lead(Pressure) - Pressure)/as.numeric((lead(DateTime) - DateTime))) %>%
  ungroup

#  Stage Pressure DateTime              dp.dt
#  <dbl>    <dbl> <dttm>                <dbl>
#1     1     3.24 2021-06-26 23:20:53   8.98 
#2     1    12.2  2021-06-26 23:20:54  -2.58 
#3     1     9.63 2021-06-26 23:20:55  -0.607
#4     1     9.03 2021-06-26 23:20:56   0    
#5     1     9.03 2021-06-26 23:20:57  NA    
#6     2     0    2021-06-29 02:30:25  14.3  
#7     2    14.3  2021-06-29 02:30:26 -12.5  
#8     2     1.74 2021-06-29 02:30:27  NA    
  • Related