Home > Mobile >  Creating summary table by dividing values by the average of the observations in the surrounding area
Creating summary table by dividing values by the average of the observations in the surrounding area

Time:02-22

The minimal sample of my data is;

library(data.table)

set.seed(1)

tbl <- data.table(store = c('A','A','A','A','A','A','A','B','B','B','B','B','B','C','C','C','C'),
                  year = c(2017,2017,2017,2017,2018,2018,2018,2017,2017,2017,2017,2017,2017,2017,2017,2017,2018),
                  week = c(12,13,15,16,2,3,4,18,19,20,22,24,25,1,2,3,2),
                  insert = sample(0:1,size = 17,replace = T),
                  demand = sample(200:250,size = 17,replace=T))

I'd like to create a summary table which is calculated in this way;

to calculate the effect where the insert column is 1, I must divide the demand value of the insert column by the average of the demand value of the nearest weeks within a year, which takes the value of the insert column zero around that week

For example, if I need to calculate for store A at 2017 and 3rd week, it must be:

rate <- 209 / mean(c(206,245))

but if the year has more than one inserts, like for store B at year 2017 it should be means of the rates (for week 19 and 20):

rate <- mean(224 / mean(c(240,236)), 245 / mean(c(240,236)))

and if I can't find two values around it, I have to calculate a ratio by dividing it by the only value I find. for example store C at year 2017:

rate <- 243 / 224

if I can find no value around the inserted row, I need to pass 1.

finally the summary table should look like;

desired_tbl <- data.table(store = c('A','A','B','C','C'),
                          year = c(2017,2018,2017,2017,2018),
                          rate = c(0.927,0.97,0.941,1.08,1))

desired_tbl

  store  year  rate
  <chr> <dbl> <dbl>
1 A      2017 0.927
2 A      2018 0.97 
3 B      2017 0.941
4 C      2017 1.08 
5 C      2018 1    

I can do all of them by writing for loops and tons of conditions but I'm looking for a vectorised way to accomplish. dplyr solutions are also welcome.

Thanks in advance.

CodePudding user response:

Here is one possible way

  • First, identify the store-years, where the insert=1 rows are consecutive, and collapse them (i.e, take the average of those). In your example, there is only one like this (2017-B, weeks 19/20). Take these collapsed rows, and bind them back to the insert=0 rows, ordering back to store, year,week
tbl[, `:=`(demand=as.double(demand), insert_id=rleid(insert))]
tbl <- rbind(
  tbl[insert==0],
  tbl[insert==1][, lapply(.SD, mean, na.rm=T), by=.(store,year,insert_id)]
)[order(store,year,week)]
  • Next, get the values before and after the insert=1 rows, generate the mean (m) of those values by row, estimate the ratio of demand to m, and retain the insert=1 rows.
tbl[, c("v1","v2"):=shift(demand,c(-1,1)), by=.(store, year)]
tbl[, m:=mean(c(v1,v2), na.rm=T), by=1:nrow(tbl)]
tbl[, rate:=demand/m][insert==1,.(rate=mean(fifelse(is.na(rate),1,rate))), by=.(store,year)]

Output:

    store  year      rate
   <char> <num>     <num>
1:      A  2017 0.9268293
2:      A  2018 0.9727273
3:      B  2017 0.9852941
4:      C  2017 1.0848214
5:      C  2018 1.0000000

Its not identical as your desired output, because I believe you have miscalculated 2017-B.

CodePudding user response:

Replace demands with insert values of 1 with NA and then fill them in forwards and backwards by group taking the mean of the two directions and dividng into demand. Then keep only the insert 1 rows and aggregate each group using what is left. If any rate is not finite use 1.

library(dplyr)
library(zoo)

tbl %>%
  mutate(d = ifelse(insert == 1, NA, demand)) %>%
  group_by(store, year) %>%
  mutate(rate = demand / 
    rowMeans(cbind(na.locf0(d), na.locf0(d, fromLast = TRUE)), na.rm = TRUE)) %>%
  filter(insert == 1) %>%
  summarize(rate = mean(rate, na.rm = TRUE), .groups = "drop") %>%
  mutate(rate = ifelse(is.finite(rate), rate, 1))

giving:

# A tibble: 5 x 3
  store  year  rate
  <chr> <dbl> <dbl>
1 A      2017 0.927
2 A      2018 0.973
3 B      2017 0.985
4 C      2017 1.08 
5 C      2018 1    
  • Related