Home > Mobile >  How to count maximum value for given time period in R?
How to count maximum value for given time period in R?

Time:05-10

I got the data from MySQL and I'm trying to visualize it and uncover some answers. Using R for the statistic.

The final product is % discount for reach price change (=row).

Here is an example of my dataset.

     itemId pricehis           timestamp
1  69295477     1290 2022-04-12 04:42:53
2  69295624     1145 2022-04-12 04:42:53
3  69296136     3609 2022-04-12 04:42:54
4  69296607      855 2022-04-12 04:42:53
5  69295291     1000 2022-04-12 04:42:50
6  69295475     4188 2022-04-12 04:42:52
7  69295614     1145 2022-04-12 04:42:51
8  69295622     1290 2022-04-12 04:42:50
9  69295692     3609 2022-04-12 04:42:49
10 69295917     1725 2022-04-12 04:42:48
11 69296090     2449 2022-04-12 04:42:53
12 69296653     1145 2022-04-12 04:42:51
13 69296657     5638 2022-04-12 04:42:48
14 69296661     1725 2022-04-12 04:42:51
15 69296696      710 2022-04-12 04:42:51

I've been stuck at one part of the calculation - maximum value for each productId in 6 months.

In the dataset there are rows for specific productId with different pricehis values and different timestamps. I need to find the max value for a given row no older than 6 months.

The formula for calculating the desired discount is:

Discount grouped by itemId = 1 - pricehis / max(pricehis in the last 6 months)

At this moment I'm unable to solve the second part - pricehis in the last 6 months. - I need a new column with maximum 'pricehis' in the last 6 months for the itemId. Also could be known as interval maximum.

I can group it by the itemId, but I can't figure out how to add the condition on 6 months max.

Any tips on how to get this?

CodePudding user response:

I like slider::slide_index_dbl for this sort of thing. Here's some fake data chosen to demonstrate the 6mo window:

data.frame(itemId = rep(1:2, each = 6),
           price = floor(100*cos(0:11)^2),
           timestamp = as.Date("2000-01-01")   50*(0:11)) -> df

We can group by itemId and then apply the window function. (Note that slider requires the data to be sorted by date within each group.)

library(dplyr)
library(lubridate) # for `%m-%`, to get sliding months (harder than it sounds!)
df %>%
  group_by(itemId) %>%
  mutate(max_6mo = slider::slide_index_dbl(price, timestamp, max, 
                                           .before = ~.x %m-% months(6))) %>%
  mutate(discount = 1 - price / max_6mo) %>%
  ungroup()

Result

# A tibble: 12 × 5
   itemId price timestamp  max_6mo discount
    <int> <dbl> <date>       <dbl>    <dbl>
 1      1   100 2000-01-01     100   0     
 2      1    29 2000-02-20     100   0.71  
 3      1    17 2000-04-10     100   0.83  
 4      1    98 2000-05-30     100   0.0200
 5      1    42 2000-07-19      98   0.571    # new max since >6mo since 100
 6      1     8 2000-09-07      98   0.918 
 7      2    92 2000-10-27      92   0     
 8      2    56 2000-12-16      92   0.391 
 9      2     2 2001-02-04      92   0.978 
10      2    83 2001-03-26      92   0.0978
11      2    70 2001-05-15      83   0.157    # new max since >6mo since 92
12      2     0 2001-07-04      83   1 
  •  Tags:  
  • r max
  • Related