Home > Net >  R: Sum two columns from different time series data frames of unequal length, indexing by timeStamp
R: Sum two columns from different time series data frames of unequal length, indexing by timeStamp

Time:11-10

I have the following dataframes:

df1 <-
    setNames(data.frame(
        as.POSIXct(
            c(
            "2022-07-29 00:00:00",
            "2022-07-29 00:00:05",
            "2022-07-29 00:05:00",
            "2022-07-29 00:05:05",
            "2022-07-29 00:10:00",
            "2022-07-29 00:15:00",
            "2022-07-29 00:20:00",
            "2022-07-29 00:20:05"
            )),
        c(1, 2, 3, 4, 5, 6, 7, 8)
    ),
    c("timeStamp", "value1"))
df2 <-
    setNames(data.frame(
        as.POSIXct(
            c(
            "2022-07-29 00:00:05",
            "2022-07-29 00:05:00",
            "2022-07-29 00:05:03",
            "2022-07-29 00:10:02",
            "2022-07-29 00:20:05"
            )),
        c(1, 2, 3, 4, 5)
    ),
    c("timeStamp", "value1"))

What I want to do, in a new column in df1, is sum the two "value1" columns together when they are from the same time stamp, and if there is only one value at a certain time stamp, then take that value. so the result should be:

df_new <- df1 %>% mutate(total = df1$value1   df2$value1)
...

so total would have 10 values. however I always get an error: longer object length is not a multiple of shorter object length

CodePudding user response:

Something like this?

df1 |> 
    full_join(df2, by = "timeStamp") |>
    group_by(timeStamp) |> 
    mutate(total = sum(value1.x, value1.y, na.rm = TRUE))


# Groups:   timeStamp [10]
   timeStamp           value1.x value1.y total
   <dttm>                 <dbl>    <dbl> <dbl>
 1 2022-07-29 00:00:00        1       NA     1
 2 2022-07-29 00:00:05        2        1     3
 3 2022-07-29 00:05:00        3        2     5
 4 2022-07-29 00:05:05        4       NA     4
 5 2022-07-29 00:10:00        5       NA     5
 6 2022-07-29 00:15:00        6       NA     6
 7 2022-07-29 00:20:00        7       NA     7
 8 2022-07-29 00:20:05        8        5    13
 9 2022-07-29 00:05:03       NA        3     3
10 2022-07-29 00:10:02       NA        4     4

Addendum:


df1 |> 
    full_join(df2, by = "timeStamp") |>
    group_by(timeStamp) |> 
    mutate(total = sum(value1.x, value1.y, na.rm = TRUE)) |> 
    ungroup() |> 
    mutate(ctotal = total   lag(total, 1),
           ctotal = ifelse(is.na(ctotal), total, ctotal))


   timeStamp           value1.x value1.y total ctotal
   <dttm>                 <dbl>    <dbl> <dbl>  <dbl>
 1 2022-07-29 00:00:00        1       NA     1      1
 2 2022-07-29 00:00:05        2        1     3      4
 3 2022-07-29 00:05:00        3        2     5      8
 4 2022-07-29 00:05:05        4       NA     4      9
 5 2022-07-29 00:10:00        5       NA     5      9
 6 2022-07-29 00:15:00        6       NA     6     11
 7 2022-07-29 00:20:00        7       NA     7     13
 8 2022-07-29 00:20:05        8        5    13     20
 9 2022-07-29 00:05:03       NA        3     3     16
10 2022-07-29 00:10:02       NA        4     4      7

  • Related