Home > Software engineering >  How can I make a moving sum from a cell in R?
How can I make a moving sum from a cell in R?

Time:10-22

I have a dataframe looking like this:

date P >60?
03-31-2020 6.8 0
03-30-2020 5.0 0
03-29-2020 0.0 0
03-28-2020 0.0 0
03-27-2020 2.0 0
03-26-2020 0.0 0
03-25-2020 71.0 1
03-24-2020 2.0 0
03-23-2020 0.0 0
03-22-2020 23.8 0
03-21-2020 0.0 0
03-20-2020 23.8 0
  • Code to reproduce the dataframe:
df1 <- data.frame(date = c("03-31-2020", "03-30-2020", "03-29-2020", "03-28-2020", "03-27-2020", "03-26-2020", 
                           "03-25-2020", "03-24-2020", "03-23-2020", "03-22-2020", "03-21-2020", "03-20-2020"),
                  P = c(6.8, 5.0, 0.0, 0.0, 2.0, 0.0, 71.0, 2.0, 0.0, 23.8, 0.0, 23.8),
                  Sup60 = c(0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0))

I want to sum the P values N days befores the P > 60. For example, the first barrier (number bigger than 60) is the P = 71 on the day 25-03-2020, from that i want to sum the 5 P values before that day, like:

2.0 0.0 23.8 0.0 23.8 = 49,6

It is a kind of moving sum because the concept is similar to a moving average. Instead of the average of the last 5 values, for example, I want the sum of the last 5 values from a value greater than 60. How can I do this?

CodePudding user response:

Hi firstly we can solve how to calculate a running sum then we do an if_else on this column, as a general rule you always split complex problems into minor solvable problems

library(tidyverse)
df_example <- tibble::tribble(
         ~date,   ~P, ~`>60?`,
  "03-31-2020",  6.8,      0L,
  "03-30-2020",    5,      0L,
  "03-29-2020",    0,      0L,
  "03-28-2020",    0,      0L,
  "03-27-2020",    2,      0L,
  "03-26-2020",    0,      0L,
  "03-25-2020",   71,      1L,
  "03-24-2020",    2,      0L,
  "03-23-2020",    0,      0L,
  "03-22-2020", 23.8,      0L,
  "03-21-2020",    0,      0L,
  "03-20-2020", 23.8,      0L
  )


# lets start by doing a simple running sum

jjj <- df_example |> 
  arrange(date)
  
jjj |> 
  mutate(running_sum = slider::slide_dbl(.x = P,.f = ~ sum(.x),.before = 5,.after = -1)) |> 
  mutate(chosen_sum = if_else(P > 60,running_sum,NA_real_))
#> # A tibble: 12 x 5
#>    date           P `>60?` running_sum chosen_sum
#>    <chr>      <dbl>  <int>       <dbl>      <dbl>
#>  1 03-20-2020  23.8      0         0         NA  
#>  2 03-21-2020   0        0        23.8       NA  
#>  3 03-22-2020  23.8      0        23.8       NA  
#>  4 03-23-2020   0        0        47.6       NA  
#>  5 03-24-2020   2        0        47.6       NA  
#>  6 03-25-2020  71        1        49.6       49.6
#>  7 03-26-2020   0        0        96.8       NA  
#>  8 03-27-2020   2        0        96.8       NA  
#>  9 03-28-2020   0        0        75         NA  
#> 10 03-29-2020   0        0        75         NA  
#> 11 03-30-2020   5        0        73         NA  
#> 12 03-31-2020   6.8      0         7         NA

Created on 2021-10-20 by the reprex package (v2.0.1)

  • Related