i have such dataset
mydata=structure(list(MDM_Key = c(10265L, 10265L, 10265L, 10265L, 1L,
1L, 1L, 1L, 1L), subgroup_id = c("08.01.2002", "08.01.2002",
"08.01.2002", "08.01.2002", "09.10.2011", "09.10.2011", "09.10.2011",
"09.10.2011", "09.10.2011"), sale_count = c(200L, 1000L, 300L,
100L, 300L, 200L, 100L, 400L, 500L), iek_base_price = c(18.85,
18.85, 14.26, 16.16, 9, 19, 9, 16, 15), iek_disc_price = c(9.425,
9.053655, 7.00879, 7.82952, 12, 16, 10, 19, 18)), class = "data.frame", row.names = c(NA,
-9L))
For each MDM_Key
category, I need to convert the absolute values of the sale_count
variable into a percentage value according to the condition:Replace each first value of sale_count
of each category with the value 100(100%).
Then we look at how much the next value has changed from the previous one.
For example, for MDM_Key
=10265 it looks like this.
MDM_Key subgroup_id sale_count iek_base_price iek_disc_price
10265 08.01.2002 100 18,85 9,425
10265 08.01.2002 500 18,85 9,053655
10265 08.01.2002 -700 14,26 7,00879
10265 08.01.2002 -200 16,16 7,82952
The very first value 200 is replaced by 100%, then relative to the first value, the second has increased 5 times (500%). Then the value decreased by 700%, and then by another 200%
Therefore, how to make such percentage transformations for each MDM_Key group so that the desired output is
MDM_Key subgroup_id sale_count iek_base_price iek_disc_price
10265 08.01.2002 100 18,85 9,425
10265 08.01.2002 500 18,85 9,053655
10265 08.01.2002 -700 14,26 7,00879
10265 08.01.2002 -200 16,16 7,82952
1 09.10.2011 100 9 12
1 09.10.2011 -100 19 16
1 09.10.2011 -100 9 10
1 09.10.2011 500 16 19
1 09.10.2011 100 15 18
thanks for your any valuable help
CodePudding user response:
Here's a solution using the tidyverse (well dplyr
one of the main packages in the tidyverse).
mydata %>%
group_by(MDM_Key) %>%
mutate(
lag_sc = lag(sale_count, default = sale_count[1]),
perc_change = (sale_count - lag_sc)/ lag_sc * 100,
)
The perc_change
column then gives you the percentage increase or decrease from the last value.
# A tibble: 9 × 7
# Groups: MDM_Key [2]
MDM_Key subgroup_id sale_count iek_base_price iek_disc_price lag_sc perc_change
<int> <chr> <int> <dbl> <dbl> <int> <dbl>
1 10265 08.01.2002 200 18.8 9.43 NA NA
2 10265 08.01.2002 1000 18.8 9.05 200 400
3 10265 08.01.2002 300 14.3 7.01 1000 -70
4 10265 08.01.2002 100 16.2 7.83 300 -66.7
5 1 09.10.2011 300 9 12 NA NA
6 1 09.10.2011 200 19 16 300 -33.3
7 1 09.10.2011 100 9 10 200 -50
8 1 09.10.2011 400 16 19 100 300
9 1 09.10.2011 500 15 18 400 25
The main things here are:
- using the
lag()
function which allows you to access the value ati-1
at when you are at rowi
. - using the
group_by
function so that you can do these calculations separately in each group.