Home > Mobile >  Remove negative values from a dataframe and a positive value that has the same amount
Remove negative values from a dataframe and a positive value that has the same amount

Time:12-06

I have an issue while working with a dataset at work. The dataset contains a list of transactions made on our website. However, when a customer returns a product it adds another row in the dataset containing a negative value. An example of how the data could look:

Person 1        $150        
Person 1       -$150
Person 1        $150
Person 2        $100
Person 2       -$100
Person 3        $50
Person 3        $20

To use this data for analysis, I am looking to remove the negative value and one value from the same person that has the same amount. The desired output of a new dataframe would be:

Person 1      $150
Person 3      $50
Person 3      $20

Any tips on how to solve this issue?

CodePudding user response:

A possible hack if customers doesn't buy again before they return:

library(dplyr)
library(stringr)

df |>
  group_by(person) |> 
  mutate(transaction = if_else(str_detect(amount, "-"), row_number()-1, row_number())) |>
  group_by(person, transaction) |> 
  filter(n() == 1) |>
  ungroup()

Output:

# A tibble: 3 × 3
  person  amount transaction
  <chr>   <chr>        <dbl>
1 Person1 $150             3
2 Person3 $50              1
3 Person3 $20              2

CodePudding user response:

As mentioned in comments that way of identify transactions seems pretty unreliable. If you has no choice anyway, you must mark refunded transactions as follows:

data <- data.frame(person=c(1,1,1,2,2,3,3),  tx=c(150,-150,150,100,-100,50,20))

s <- lapply(split(data, interaction(data$person, abs(data$tx))),\(x){
                  cnt <- with(x, min(length(tx[tx<0]), length(tx[tx>0])));
                  within(x, refund <- cumsum(tx<0) <= cnt & cumsum(tx>0) <= cnt ) }) 
                  
                  
unsplit(s, interaction(data$person, abs(data$tx)))
#>   person   tx refund
#> 1      1  150   TRUE
#> 2      1 -150   TRUE
#> 3      1  150  FALSE
#> 4      2  100   TRUE
#> 5      2 -100   TRUE
#> 6      3   50  FALSE
#> 7      3   20  FALSE
  • Related