Home > Software design >  How to aggregate and simultaneously replace values from one dataset for the second in R
How to aggregate and simultaneously replace values from one dataset for the second in R

Time:04-04

Here the part of my dataset

mydata=structure(list(sales_point_id = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), calendar_id_operday = c(20210102L, 
20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 
20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 
20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 
20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 
20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 
20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 
20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 
20210102L, 20210102L, 20210102L, 20210102L), line_fact_amt = c(23749L, 
1000L, 3050L, 1550L, 8900L, 1550L, 0L, 300L, 0L, 499L, 5450L, 
300L, 0L, 499L, 599L, 599L, 6050L, 300L, 599L, 1400L, 300L, 0L, 
2000L, 700L, 0L, 5990L, 8877L, 1999L, 257L, 200L, 361L, 300L, 
1990L, 2453L, 3140L, 0L, 0L, 199L, 599L, 10990L, 7990L, 773L, 
400L, 6000L, 2269L, 2000L, 1999L)), class = "data.frame", row.names = c(NA, 
-47L))

-calendar_id_operday means day of week 20210102(YYYYMMDD) .I need aggregate by sum line_fact_amt for each day of calendar_id_operday . And the second ,there is dataset ,with that days but for 2022 year

mydata2=structure(list(sales_point_id = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), calendar_id_operday = c(20220102L, 
20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 
20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 
20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 
20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 
20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 
20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 
20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 
20220102L, 20220102L, 20220102L, 20220102L), line_fact_amt = c(38586L, 
15837L, 17887L, 16387L, 23737L, 16387L, 14837L, 15137L, 14837L, 
15336L, 20287L, 15137L, 14837L, 15336L, 15436L, 15436L, 20887L, 
15137L, 15436L, 16237L, 15137L, 14837L, 16837L, 15537L, 14837L, 
20827L, 23714L, 16836L, 15094L, 15037L, 15198L, 15137L, 16827L, 
17290L, 17977L, 14837L, 14837L, 15036L, 15436L, 25827L, 22827L, 
15610L, 15237L, 20837L, 17106L, 16837L, 16836L)), class = "data.frame", row.names = c(NA, 
-47L))

also aggregate line_fact_amt by calendar_id_operday.

The Greatest Difficulty for me: if the aggregated value of the same day of the next year(2022) is greater than for the same day but of the last year(2021), well, for example, for 20220102 the sum = 815519, while for the same day but 2021 = 118180, we have a difference for this day between 2021 and 2022 more than 40%, and if there is a difference for the same day>40%, then replace for the next year with the values of the previous year of this date 10%, i.e. 815519 is replaced by 118180 10% of it (129998). How to do this procedure for sales_point_id separately?

Thanks for your valuable help.

CodePudding user response:

Based on the description, perhaps this helps

library(lubridate)
library(dplyr)
 agg1 <- mydata %>% 
 group_by(calendar_id_operday) %>% 
 summarise(line_fact_amt = sum(line_fact_amt, na.rm = TRUE))
 agg2 <- mydata2 %>%
  group_by(calendar_id_operday) %>%
  summarise(line_fact_amt = sum(line_fact_amt, na.rm = TRUE))
 agg2 %>%
  mutate(calendar_prev = as.integer(format(ymd(calendar_id_operday) - 
   years(1), '%Y%m%d')) ) %>% 
  full_join(agg1, ., by = c(calendar_id_operday = "calendar_prev")) %>% 
    mutate(changed = ifelse((line_fact_amt.y  - line_fact_amt.x) >= 0.4 * 
     line_fact_amt.y, line_fact_amt.x   0.1 * line_fact_amt.x, line_fact_amt.y ))
  • Related