I have a quite large data set with sales information. Some of these have negative salesvalue
(credit note). I need to connect these credit note to the original sales note, and at the same time keep the original sales date. There is no single variable indicating which sales note the credit note belongs to, so I have to identify where all other variables are equal.
Example data:
data <- data.frame(number = 1:4,
Date=c("02-01-2022", "06-01-2022", "07-01-2022", "06-01-2022"),
state = c("filet", "filet", "whole", "filet"),
specie = c("haddock", "cod", "haddock", "cod"),
salesvalue= c(300,500,200,-400),
salesdate=c("01-01-2022", "05-01-2022", "06-01-2022", "12-01-2022"))
data
# number Date state specie value salesdate
# 1 02-01-2022 filet haddock 300 01-01-2022
# 2 06-01-2022 filet cod 500 05-01-2022
# 3 07-01-2022 whole haddock 200 06-01-2022
# 4 06-01-2022 filet cod -400 12-01-2022
I want to change salesdate
of salesvalue < 0
to salesdate
where salesvalue > 0
and everything else equal, and aggregate them.
In this case I want to change salesdate
of number 4 from "12-01-2022" to "05-01-2022" (same as 2 where all other values are equal).
This is what I tried:
datanew <- data %>%
group_by(Date, state, specie, value) %>%
mutate(date_sale=if_else(value < 0,
first(salesdate[value>0]), salesdate)) %>%
group_by(Date, state, specie, value, salesdate, date_sale) %>%
summarise(value_new=sum(value))
datanew
# Date state specie value salesdate date_sale value_new
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl>
# 02-01-2022 filet haddock 300 01-01-2022 01-01-2022 300
# 06-01-2022 filet cod -400 12-01-2022 NA -400
# 06-01-2022 filet cod 500 05-01-2022 05-01-2022 500
# 07-01-2022 whole haddock 200 06-01-2022 06-01-2022 200
But it doesn't solve my problem. I have no I idea how to fix this, and would appreciate any suggestions!
Thank you in advance!
CodePudding user response:
You can use rows_update()
:
library(dplyr)
data %>%
filter(salesvalue >= 0) %>%
select(-c(number, salesvalue)) %>%
rows_update(data, ., by = c("Date", "state", "specie"))
# number Date state specie salesvalue salesdate
# 1 1 02-01-2022 filet haddock 300 01-01-2022
# 2 2 06-01-2022 filet cod 500 05-01-2022
# 3 3 07-01-2022 whole haddock 200 06-01-2022
# 4 4 06-01-2022 filet cod -400 05-01-2022
CodePudding user response:
Could you please check if this works for you: Cave: It may strongly on the original data!
The idea is to bring the data in position by grouping and arranging:
library(dplyr)
data %>%
group_by(Date, state, specie) %>%
arrange(salesvalue, .by_group = TRUE) %>%
ungroup() %>%
mutate(salesdate = ifelse(salesvalue < 0, lead(salesdate), salesdate)) %>%
arrange(number)
number Date state specie salesvalue salesdate
<int> <chr> <chr> <chr> <dbl> <chr>
1 1 02-01-2022 filet haddock 300 01-01-2022
2 2 06-01-2022 filet cod 500 05-01-2022
3 3 07-01-2022 whole haddock 200 06-01-2022
4 4 06-01-2022 filet cod -400 05-01-2022