Home > database >  Filtering every positive value for every negative in R
Filtering every positive value for every negative in R

Time:10-22

I have a dataset with financial data. Sometimes, a product gets refunded, resulting in a negative count of the product (so the money gets returned). I want to conditionally filter these rows out of the dataset. Example:

library(tidyverse)
set.seed(1)

df <- tibble(
  count = sample(c(-1,1),80,replace = TRUE,prob=c(.2,.8)),
  id = rep(1:4,20)
)

df %>% 
  group_by(id) %>%
  summarize(total = sum(count))

# A tibble: 4 x 2
     id total
  <int> <dbl>
1     1    10
2     2    14
3     3    16
4     4    10

id = 1 has 15 positive counts and 5 negatives. (15 - 5= 10). I want to keep 10 values in df with id = 1 with the positive values. id = 2 has 17 positive counts and 3 negatives. (17- 3 = 14). I want to keep 14 values in df with id = 2 with the positive values.

In the end, this condition should be True nrow(df) == sum(df$count)

Unfortunately, a filtering join such as anti_join() will remove all the rows. For some reason I cannot think of another option to filter the tibble.

Thanks for helping me!

CodePudding user response:

You can "uncount" using the total column to get the number of repeats of each row.

df %>% 
  group_by(id) %>%
  summarize(total = sum(count)) %>%
  uncount(total) %>%
  mutate(count = 1)
#> # A tibble: 50 x 2
#>       id count
#>    <int> <dbl>
#>  1     1     1
#>  2     1     1
#>  3     1     1
#>  4     1     1
#>  5     1     1
#>  6     1     1
#>  7     1     1
#>  8     1     1
#>  9     1     1
#> 10     1     1
#> # ... with 40 more rows

Created on 2022-10-21 with reprex v2.0.2

  • Related