Home > Enterprise >  How to subtract variables between months in groups separately in R
How to subtract variables between months in groups separately in R

Time:12-19

Here data example

mydata=structure(list(competitor_id = c(26L, 26L, 26L, 26L, 26L, 26L, 
27L, 27L, 27L, 27L, 27L, 27L), comp_article_id = c("989EFD73-35F1-8E94-8B77-91478BD1041B", 
"989EFD73-35F1-8E94-8B77-91478BD1041B", "989EFD73-35F1-8E94-8B77-91478BD1041B", 
"989EFD73-35F1-8E94-8B77-91478BD1041B", "989EFD73-35F1-8E94-8B77-91478BD1041B", 
"989EFD73-35F1-8E94-8B77-91478BD1041B", "989EFD73-35F1-8E94-8B77-91478BD1041B", 
"989EFD73-35F1-8E94-8B77-91478BD1041B", "989EFD73-35F1-8E94-8B77-91478BD1041B", 
"989EFD73-35F1-8E94-8B77-91478BD1041B", "989EFD73-35F1-8E94-8B77-91478BD1041B", 
"989EFD73-35F1-8E94-8B77-91478BD1041B"), MDM_Key = c(40715L, 
40715L, 40715L, 40715L, 40715L, 40715L, 40715L, 40715L, 40715L, 
40715L, 40715L, 40715L), subgroup_id = c("05.07.2002", "05.07.2002", 
"05.07.2002", "05.07.2002", "05.07.2002", "05.07.2002", "05.07.2002", 
"05.07.2002", "05.07.2002", "05.07.2002", "05.07.2002", "05.07.2002"
), month_id = c(202009L, 202107L, 202204L, 202206L, 202208L, 
202210L, 202009L, 202107L, 202204L, 202206L, 202208L, 202210L
), comp_disc_price = c(1999.2, 2100, 2100, 2940, 2940, 2940, 
1999.2, 2100, 2100, 2940, 2940, 2940), iek_disc_price = c(1436.63, 
1709.09, 1611.12, 1611.12, 1611.12, 1611.12, 1436.63, 1709.09, 
1611.12, 1611.12, 1611.12, 1611.12)), class = "data.frame", row.names = c(NA, 
-12L))

I need for the comp_disc_price and iek_disc_price variables to calculate the difference in values by month(month_id) . To be more clearly.

month_id    comp_disc_price iek_disc_price
202009             1999,2       1436,63
202107             2100         1709,09

subtract the smaller month from the larger month

2100-1999,2=100,8 (comp_disc_price) (202107-202009)
1706,09-1436,63=272,46 (iek_disc_price)

Then subtract data between months 202204 and 202107 , then 202204=6 and 202204... How perform this calculation, but by for each group competitor_id MDM_Key? The ideal output for me would be

  competitor_id MDM_Key subgroup_id         month
1            26   40715  05.07.2002 202009-202107
2            26   40715  05.07.2002 202204-202206
3            26   40715  05.07.2002 202206-202208
4            26   40715  05.07.2002 202208-202210
5            27   40715  05.07.2002 202009-202107
6            27   40715  05.07.2002 202204-202206
7            27   40715  05.07.2002 202206-202208
8            27   40715  05.07.2002 202208-202210
  diff_comp_disc_price diff_iek_disc_price
1                100.8              272.46
2                  0.0              -97.97
3                840.0                0.00
4                  0.0                0.00
5                100.8              272.46
6                  0.0              -97.97
7                840.0                0.00
8                  0.0               97.97

Thanks for your valuable help

CodePudding user response:

Use diff, which calculates the difference between consecutive values. Since it returns a length 1 shorter than the source, we need to pad it with an NA; whether that goes at the beginning or the end is contextual, I'll choose the end for now.

mydata %>%
  arrange(competitor_id, month_id) %>%
  group_by(competitor_id) %>%
  mutate(across(c(comp_disc_price, iek_disc_price), ~ c(diff(.), NA))) %>%
  ungroup()
# # A tibble: 12 × 7
#    competitor_id comp_article_id                      MDM_Key subgroup_id month_id comp_disc_price iek_disc_price
#            <int> <chr>                                  <int> <chr>          <int>           <dbl>          <dbl>
#  1            26 989EFD73-35F1-8E94-8B77-91478BD1041B   40715 05.07.2002    202009            101.          272. 
#  2            26 989EFD73-35F1-8E94-8B77-91478BD1041B   40715 05.07.2002    202107              0           -98.0
#  3            26 989EFD73-35F1-8E94-8B77-91478BD1041B   40715 05.07.2002    202204            840             0  
#  4            26 989EFD73-35F1-8E94-8B77-91478BD1041B   40715 05.07.2002    202206              0             0  
#  5            26 989EFD73-35F1-8E94-8B77-91478BD1041B   40715 05.07.2002    202208              0             0  
#  6            26 989EFD73-35F1-8E94-8B77-91478BD1041B   40715 05.07.2002    202210             NA            NA  
#  7            27 989EFD73-35F1-8E94-8B77-91478BD1041B   40715 05.07.2002    202009            101.          272. 
#  8            27 989EFD73-35F1-8E94-8B77-91478BD1041B   40715 05.07.2002    202107              0           -98.0
#  9            27 989EFD73-35F1-8E94-8B77-91478BD1041B   40715 05.07.2002    202204            840             0  
# 10            27 989EFD73-35F1-8E94-8B77-91478BD1041B   40715 05.07.2002    202206              0             0  
# 11            27 989EFD73-35F1-8E94-8B77-91478BD1041B   40715 05.07.2002    202208              0             0  
# 12            27 989EFD73-35F1-8E94-8B77-91478BD1041B   40715 05.07.2002    202210             NA            NA  
  • Related