Home > Back-end >  how to transform the values for each group into a percentage in R
how to transform the values for each group into a percentage in R

Time:10-30

i have such dataset

mydata=structure(list(MDM_Key = c(10265L, 10265L, 10265L, 10265L, 1L, 
1L, 1L, 1L, 1L), subgroup_id = c("08.01.2002", "08.01.2002", 
"08.01.2002", "08.01.2002", "09.10.2011", "09.10.2011", "09.10.2011", 
"09.10.2011", "09.10.2011"), sale_count = c(200L, 1000L, 300L, 
100L, 300L, 200L, 100L, 400L, 500L), iek_base_price = c(18.85, 
18.85, 14.26, 16.16, 9, 19, 9, 16, 15), iek_disc_price = c(9.425, 
9.053655, 7.00879, 7.82952, 12, 16, 10, 19, 18)), class = "data.frame", row.names = c(NA, 
-9L))

For each MDM_Key category, I need to convert the absolute values of the sale_count variable into a percentage value according to the condition:Replace each first value of sale_count of each category with the value 100(100%). Then we look at how much the next value has changed from the previous one. For example, for MDM_Key=10265 it looks like this.

MDM_Key subgroup_id sale_count  iek_base_price  iek_disc_price
10265   08.01.2002  100          18,85             9,425
10265   08.01.2002  500          18,85             9,053655
10265   08.01.2002  -700         14,26             7,00879
10265   08.01.2002  -200         16,16             7,82952

The very first value 200 is replaced by 100%, then relative to the first value, the second has increased 5 times (500%). Then the value decreased by 700%, and then by another 200%

Therefore, how to make such percentage transformations for each MDM_Key group so that the desired output is

MDM_Key subgroup_id sale_count  iek_base_price  iek_disc_price
10265   08.01.2002  100 18,85   9,425
10265   08.01.2002  500 18,85   9,053655
10265   08.01.2002  -700    14,26   7,00879
10265   08.01.2002  -200    16,16   7,82952
1   09.10.2011  100          9        12
1   09.10.2011  -100         19       16
1   09.10.2011  -100         9        10
1   09.10.2011  500          16       19
1   09.10.2011  100         15        18

thanks for your any valuable help

CodePudding user response:

Here's a solution using the tidyverse (well dplyr one of the main packages in the tidyverse).

mydata %>% 
  group_by(MDM_Key) %>% 
  mutate(
    lag_sc = lag(sale_count, default = sale_count[1]),
    perc_change = (sale_count - lag_sc)/ lag_sc * 100,
  )

The perc_change column then gives you the percentage increase or decrease from the last value.

# A tibble: 9 × 7
# Groups:   MDM_Key [2]
  MDM_Key subgroup_id sale_count iek_base_price iek_disc_price lag_sc perc_change
    <int> <chr>            <int>          <dbl>          <dbl>  <int>       <dbl>
1   10265 08.01.2002         200           18.8           9.43     NA        NA  
2   10265 08.01.2002        1000           18.8           9.05    200       400  
3   10265 08.01.2002         300           14.3           7.01   1000       -70  
4   10265 08.01.2002         100           16.2           7.83    300       -66.7
5       1 09.10.2011         300            9            12        NA        NA  
6       1 09.10.2011         200           19            16       300       -33.3
7       1 09.10.2011         100            9            10       200       -50  
8       1 09.10.2011         400           16            19       100       300  
9       1 09.10.2011         500           15            18       400        25  

The main things here are:

  • using the lag() function which allows you to access the value at i-1 at when you are at row i.
  • using the group_by function so that you can do these calculations separately in each group.
  • Related