Home > database >  Change values according to other variables in R
Change values according to other variables in R

Time:01-02

I have a quite large data set with sales information. Some of these have negative salesvalue (credit note). I need to connect these credit note to the original sales note, and at the same time keep the original sales date. There is no single variable indicating which sales note the credit note belongs to, so I have to identify where all other variables are equal.

Example data:

data <- data.frame(number = 1:4,               
                   Date=c("02-01-2022", "06-01-2022", "07-01-2022", "06-01-2022"),
                   state = c("filet", "filet", "whole", "filet"),
                   specie = c("haddock", "cod", "haddock", "cod"),
                   salesvalue= c(300,500,200,-400),
                   salesdate=c("01-01-2022", "05-01-2022", "06-01-2022", "12-01-2022"))
data  

#  number       Date state  specie value  salesdate
#      1 02-01-2022 filet haddock   300 01-01-2022
#      2 06-01-2022 filet     cod   500 05-01-2022
#      3 07-01-2022 whole haddock   200 06-01-2022
#      4 06-01-2022 filet     cod  -400 12-01-2022

I want to change salesdate of salesvalue < 0 to salesdate where salesvalue > 0 and everything else equal, and aggregate them.

In this case I want to change salesdate of number 4 from "12-01-2022" to "05-01-2022" (same as 2 where all other values are equal).

This is what I tried:

datanew <- data %>% 
  group_by(Date, state, specie, value) %>% 
  mutate(date_sale=if_else(value < 0, 
         first(salesdate[value>0]), salesdate)) %>% 
  group_by(Date, state, specie, value, salesdate, date_sale) %>% 
  summarise(value_new=sum(value))

 
datanew 

# Date       state specie  value salesdate  date_sale  value_new
  <chr>      <chr> <chr>   <dbl> <chr>      <chr>          <dbl>
# 02-01-2022 filet haddock   300 01-01-2022 01-01-2022       300
# 06-01-2022 filet cod      -400 12-01-2022 NA              -400
# 06-01-2022 filet cod       500 05-01-2022 05-01-2022       500
# 07-01-2022 whole haddock   200 06-01-2022 06-01-2022       200

But it doesn't solve my problem. I have no I idea how to fix this, and would appreciate any suggestions!

Thank you in advance!

CodePudding user response:

You can use rows_update():

library(dplyr)

data %>%
  filter(salesvalue >= 0) %>%
  select(-c(number, salesvalue)) %>%
  rows_update(data, ., by = c("Date", "state", "specie"))

#   number       Date state  specie salesvalue  salesdate
# 1      1 02-01-2022 filet haddock        300 01-01-2022
# 2      2 06-01-2022 filet     cod        500 05-01-2022
# 3      3 07-01-2022 whole haddock        200 06-01-2022
# 4      4 06-01-2022 filet     cod       -400 05-01-2022

CodePudding user response:

Could you please check if this works for you: Cave: It may strongly on the original data!

The idea is to bring the data in position by grouping and arranging:

library(dplyr)

data %>% 
  group_by(Date, state, specie) %>% 
  arrange(salesvalue, .by_group = TRUE) %>% 
  ungroup() %>% 
  mutate(salesdate = ifelse(salesvalue < 0, lead(salesdate), salesdate)) %>% 
  arrange(number)
  number Date       state specie  salesvalue salesdate 
   <int> <chr>      <chr> <chr>        <dbl> <chr>     
1      1 02-01-2022 filet haddock        300 01-01-2022
2      2 06-01-2022 filet cod            500 05-01-2022
3      3 07-01-2022 whole haddock        200 06-01-2022
4      4 06-01-2022 filet cod           -400 05-01-2022
  • Related