Home > Enterprise >  How to add a moving sum and a function to a dataframe
How to add a moving sum and a function to a dataframe

Time:06-09

I need to add a new column containing an specifica function to a data frame.

Basically i need to calculate an indicator which is the sum of the past 5 observations (in column "value1") multuplied by 100 and divided by column "value2" {this one not as a sum, just the simple observatio} of my sample data below.

somewhat like this (its not a formal notation): indicator = [sum (i-5) value1 / value2] * 100

the indicator must be calculate by country.

in case of countries or dates "mixed" in the data frame the formula need to be able to recognize and sum the correct values only, in the correct order.

If there is a NA value in the value 1, the formula should also be able to ignore this line as a computation. ex: 31/12, 1/01, 2/01, 3/01, 4/01 = NA, 05/01 --> the indicator of 06/01 will then take into account the past 5 valid observation, 31/12, 1/01, 2/01, 3/01, 05/01.

Important -> only use base R

Example of the data frame (my actual data frame is more complex)

set.seed(1)
Country <- c(rep("USA", 10),rep("UK", 10), rep("China", 10))
Value1 <- sample(x = c(120, 340, 423), size = 30, replace = TRUE)
Value2 <- sample(x = c(1,3,5,6,9), size = 30, replace = TRUE)
date <- seq(as.POSIXct('2020/01/01'),  
                                 as.POSIXct('2020/01/30'),
                                 by = "1 day")
df = data.frame(Country, Value1, Value2, date)

I thank you all very much in advance. this one has bein very hard to crack :D

CodePudding user response:

Since it has to be done group-wise but in base R, you could use the split-apply-bind method

df2 <- do.call(rbind, lapply(split(df, df$Country), function(d) {
  d <- d[order(d$date),]
  d$computed <- 100 * d$Value1 / d$Value2
  d$Result <- NA
  for(i in 5:nrow(d)) d$Result[i] <- sum(tail(na.omit(d$computed[seq(i)]), 5))
  d[!names(d) %in% "computed"]
}))

rn <- sapply(strsplit(rownames(df2), "\\."), function(x) as.numeric(x[2]))
`rownames<-`(df2[rn,], NULL)
#>    Country Value1 Value2       date   Result
#> 1      USA    423      9 2020-01-01       NA
#> 2      USA    120      3 2020-01-02       NA
#> 3      USA    120      3 2020-01-03       NA
#> 4      USA    423      5 2020-01-04       NA
#> 5      USA    120      1 2020-01-05 33160.00
#> 6      USA    120      1 2020-01-06 40460.00
#> 7      USA    120      3 2020-01-07 40460.00
#> 8      USA    340      1 2020-01-08 70460.00
#> 9      USA    423      6 2020-01-09 69050.00
#> 10     USA    340      9 2020-01-10 60827.78
#> 11      UK    340      5 2020-01-11       NA
#> 12      UK    423      6 2020-01-12       NA
#> 13      UK    423      3 2020-01-13       NA
#> 14      UK    340      1 2020-01-14       NA
#> 15      UK    120      3 2020-01-15 65950.00
#> 16      UK    120      9 2020-01-16 60483.33
#> 17      UK    423      1 2020-01-17 95733.33
#> 18      UK    423      9 2020-01-18 86333.33
#> 19      UK    340      1 2020-01-19 86333.33
#> 20      UK    340      3 2020-01-20 93666.67
#> 21   China    340      1 2020-01-21       NA
#> 22   China    340      9 2020-01-22       NA
#> 23   China    423      3 2020-01-23       NA
#> 24   China    120      1 2020-01-24       NA
#> 25   China    340      9 2020-01-25 67655.56
#> 26   China    340      5 2020-01-26 40455.56
#> 27   China    120      5 2020-01-27 39077.78
#> 28   China    340      9 2020-01-28 28755.56
#> 29   China    340      9 2020-01-29 20533.33
#> 30   China    423      5 2020-01-30 25215.56

Created on 2022-06-08 by the reprex package (v2.0.1)

CodePudding user response:

Here's an option - not sure if the calculation is as you intend:

split_df <- split(df, Country)
split_df <- lapply(split_df, function(x) {
  x <- x[order(x$date),]
  x$index <- nrow(x):1
  x$indicator <- ifelse(x$index <= 5, sum(x$Value2[x$index <= 5]) * 100 / x$Value2, NA)
  x$index <- NULL
  return(x)
})

final_df <- do.call(rbind, split_df)
     Country Value1 Value2       date indicator
China.21   China    120      3 2020-01-21        NA
China.22   China    423      5 2020-01-22        NA
China.23   China    340      6 2020-01-23        NA
China.24   China    120      3 2020-01-24        NA
China.25   China    340      9 2020-01-25        NA
China.26   China    423      6 2020-01-26  366.6667
China.27   China    120      3 2020-01-27  733.3333
China.28   China    340      3 2020-01-28  733.3333
China.29   China    120      5 2020-01-29  440.0000
China.30   China    340      5 2020-01-30  440.0000
UK.11         UK    423      1 2020-01-11        NA
UK.12         UK    340      6 2020-01-12        NA
UK.13         UK    423      1 2020-01-13        NA
UK.14         UK    423      5 2020-01-14        NA
UK.15         UK    340      6 2020-01-15        NA
UK.16         UK    340      1 2020-01-16 2400.0000
UK.17         UK    120      5 2020-01-17  480.0000
UK.18         UK    423      9 2020-01-18  266.6667
UK.19         UK    120      6 2020-01-19  400.0000
UK.20         UK    423      3 2020-01-20  800.0000
USA.1        USA    423      1 2020-01-01        NA
USA.2        USA    423      5 2020-01-02        NA
USA.3        USA    423      5 2020-01-03        NA
USA.4        USA    423      6 2020-01-04        NA
USA.5        USA    423      1 2020-01-05        NA
USA.6        USA    340      5 2020-01-06  600.0000
USA.7        USA    340      5 2020-01-07  600.0000
USA.8        USA    423      6 2020-01-08  500.0000
USA.9        USA    423      5 2020-01-09  600.0000
USA.10       USA    423      9 2020-01-10  333.3333

CodePudding user response:

In base R you could do:

transform(df,Results=ave(Value1,Country,FUN=function(x)replace(x,!is.na(x),
                             filter(na.omit(x),rep(1,5),sides=1)))/Value2)

   Country Value1 Value2       date   Results
1      USA    120      1 2020-01-01        NA
2      USA    423      6 2020-01-02        NA
3      USA    120      1 2020-01-03        NA
4      USA    340      6 2020-01-04        NA
5      USA    120      5 2020-01-05  224.6000
6      USA    423      3 2020-01-06  475.3333
7      USA    423      3 2020-01-07  475.3333
8      USA    340      6 2020-01-08  274.3333
9      USA    340      6 2020-01-09  274.3333
10     USA    423      6 2020-01-10  324.8333
11      UK    423      3 2020-01-11        NA
12      UK    120      6 2020-01-12        NA
13      UK    120      1 2020-01-13        NA
14      UK    120      1 2020-01-14        NA
15      UK    340      6 2020-01-15  187.1667
16      UK    340      1 2020-01-16 1040.0000
17      UK    340      3 2020-01-17  420.0000
18      UK    340      5 2020-01-18  296.0000
19      UK    423      3 2020-01-19  594.3333
20      UK    120      3 2020-01-20  521.0000
21   China    423      9 2020-01-21        NA
22   China    120      3 2020-01-22        NA
23   China    120      1 2020-01-23        NA
24   China    120      5 2020-01-24        NA
25   China    120      5 2020-01-25  180.6000
26   China    340      6 2020-01-26  136.6667
27   China    120      5 2020-01-27  164.0000
28   China    120      1 2020-01-28  820.0000
29   China    340      6 2020-01-29  173.3333
30   China    340      9 2020-01-30  140.0000
  • Related