Here data example
mydata=structure(list(competitor_id = c(26L, 26L, 26L, 26L, 26L, 26L,
27L, 27L, 27L, 27L, 27L, 27L), comp_article_id = c("989EFD73-35F1-8E94-8B77-91478BD1041B",
"989EFD73-35F1-8E94-8B77-91478BD1041B", "989EFD73-35F1-8E94-8B77-91478BD1041B",
"989EFD73-35F1-8E94-8B77-91478BD1041B", "989EFD73-35F1-8E94-8B77-91478BD1041B",
"989EFD73-35F1-8E94-8B77-91478BD1041B", "989EFD73-35F1-8E94-8B77-91478BD1041B",
"989EFD73-35F1-8E94-8B77-91478BD1041B", "989EFD73-35F1-8E94-8B77-91478BD1041B",
"989EFD73-35F1-8E94-8B77-91478BD1041B", "989EFD73-35F1-8E94-8B77-91478BD1041B",
"989EFD73-35F1-8E94-8B77-91478BD1041B"), MDM_Key = c(40715L,
40715L, 40715L, 40715L, 40715L, 40715L, 40715L, 40715L, 40715L,
40715L, 40715L, 40715L), subgroup_id = c("05.07.2002", "05.07.2002",
"05.07.2002", "05.07.2002", "05.07.2002", "05.07.2002", "05.07.2002",
"05.07.2002", "05.07.2002", "05.07.2002", "05.07.2002", "05.07.2002"
), month_id = c(202009L, 202107L, 202204L, 202206L, 202208L,
202210L, 202009L, 202107L, 202204L, 202206L, 202208L, 202210L
), comp_disc_price = c(1999.2, 2100, 2100, 2940, 2940, 2940,
1999.2, 2100, 2100, 2940, 2940, 2940), iek_disc_price = c(1436.63,
1709.09, 1611.12, 1611.12, 1611.12, 1611.12, 1436.63, 1709.09,
1611.12, 1611.12, 1611.12, 1611.12)), class = "data.frame", row.names = c(NA,
-12L))
I need for the comp_disc_price
and iek_disc_price
variables to calculate the difference in values by month(month_id) .
To be more clearly.
month_id comp_disc_price iek_disc_price
202009 1999,2 1436,63
202107 2100 1709,09
subtract the smaller month from the larger month
2100-1999,2=100,8 (comp_disc_price) (202107-202009)
1706,09-1436,63=272,46 (iek_disc_price)
Then subtract data between months 202204 and 202107 , then 202204=6 and 202204...
How perform this calculation, but by for each group competitor_id MDM_Key
?
The ideal output for me would be
competitor_id MDM_Key subgroup_id month
1 26 40715 05.07.2002 202009-202107
2 26 40715 05.07.2002 202204-202206
3 26 40715 05.07.2002 202206-202208
4 26 40715 05.07.2002 202208-202210
5 27 40715 05.07.2002 202009-202107
6 27 40715 05.07.2002 202204-202206
7 27 40715 05.07.2002 202206-202208
8 27 40715 05.07.2002 202208-202210
diff_comp_disc_price diff_iek_disc_price
1 100.8 272.46
2 0.0 -97.97
3 840.0 0.00
4 0.0 0.00
5 100.8 272.46
6 0.0 -97.97
7 840.0 0.00
8 0.0 97.97
Thanks for your valuable help
CodePudding user response:
Use diff
, which calculates the difference between consecutive values. Since it returns a length 1 shorter than the source, we need to pad it with an NA
; whether that goes at the beginning or the end is contextual, I'll choose the end for now.
mydata %>%
arrange(competitor_id, month_id) %>%
group_by(competitor_id) %>%
mutate(across(c(comp_disc_price, iek_disc_price), ~ c(diff(.), NA))) %>%
ungroup()
# # A tibble: 12 × 7
# competitor_id comp_article_id MDM_Key subgroup_id month_id comp_disc_price iek_disc_price
# <int> <chr> <int> <chr> <int> <dbl> <dbl>
# 1 26 989EFD73-35F1-8E94-8B77-91478BD1041B 40715 05.07.2002 202009 101. 272.
# 2 26 989EFD73-35F1-8E94-8B77-91478BD1041B 40715 05.07.2002 202107 0 -98.0
# 3 26 989EFD73-35F1-8E94-8B77-91478BD1041B 40715 05.07.2002 202204 840 0
# 4 26 989EFD73-35F1-8E94-8B77-91478BD1041B 40715 05.07.2002 202206 0 0
# 5 26 989EFD73-35F1-8E94-8B77-91478BD1041B 40715 05.07.2002 202208 0 0
# 6 26 989EFD73-35F1-8E94-8B77-91478BD1041B 40715 05.07.2002 202210 NA NA
# 7 27 989EFD73-35F1-8E94-8B77-91478BD1041B 40715 05.07.2002 202009 101. 272.
# 8 27 989EFD73-35F1-8E94-8B77-91478BD1041B 40715 05.07.2002 202107 0 -98.0
# 9 27 989EFD73-35F1-8E94-8B77-91478BD1041B 40715 05.07.2002 202204 840 0
# 10 27 989EFD73-35F1-8E94-8B77-91478BD1041B 40715 05.07.2002 202206 0 0
# 11 27 989EFD73-35F1-8E94-8B77-91478BD1041B 40715 05.07.2002 202208 0 0
# 12 27 989EFD73-35F1-8E94-8B77-91478BD1041B 40715 05.07.2002 202210 NA NA